Cloud data storage is widely used not just by cloud native compute environments but also on-premises computer systems and applications. It has been so as there are tremendous benefits that cloud data storage brings, most of which are difficult or impossible for non-cloud data solutions. These include but not limited to:
• Virtually unlimited capacity
• The ease of provisioning
• Being on-demand
• Data durability (through the distribution and replication powered by the cloud)
• Cost efficiency
• Reduced or eliminated maintenance overhead
AWS Storage Gateway is a common storage service for on-premises and AWS cloud hybrid environments – whether it is being in the transition phase of a migration from on-premises to the cloud or that both on-premises and cloud environments will continue to work together and coexist for the times to come.
AWS Storage Gateway provides local caching for on-premises application’s latency-improved access to the cloud data storage.
Various protocols are supported by the various flavours of AWS Storage Gateway:
File Gateway: NFS, SMB
Volume Gateway: iSCSI
Tape Gateway: iSCSI VTL
The ‘front-end’ of the storage gateway service can be an on-premises virtual machine, powered by VMware ESXi, Microsoft Hyper-V, or Linux KVM; or can be in the form of a hardware appliance. (There are a couple of other options, such as the use of an EC2 instance, for additional designated use cases.)
The following AWS diagram depicts the AWS Storage Gateway service:
Amazon S3 is a commonly used ‘backend’ storage for the AWS Storage Gateway. That is, the data (files) is stored in S3 buckets as objects while the access to the data is through the storage gateway ‘frontend’, which provides the protocol interface and the caching.
The range of the AWS storage service is truly impressive – with 10+ offerings that each may have multiple flavours. Another storage that can also link to S3 is FSx for Lustre.
The Lustre file system is an open-source file system. It was developed about twenty years ago by Peter Braam who worked in Carnegie Mellon University at the time. Lustre file system uses a parallel distributed architecture, typically for large cluster computing. It was quite interesting that the history of Lustre crossed the path a bit with that of Open ZFS, another type of high performance data storage, which this blog discussed in a previous piece.
Lustre file system, depending on the implementation, may offer storage capability to the level of hundreds of petabytes (1 PB = 1,000 TBs = 1,000,000 GBs), and throughputs of terabytes (TB) per second, distributed on hundreds of servers. it’s the most widely used file system for the 500 fastest computers in the world.
The name, Lustre, gives its characteristics away: it was used as a portmanteau word of Linux and cluster.
Amazon FSx for Lustre file systems, AWS’ flavour of a fully managed cloud service for Lustre, supports concurrent access to the same file or directory from thousands of compute instances, with consistently low latencies for file operations.
FSx for Lustre integrates natively with Amazon S3. Easily importing and exporting S3 data is a strong feature of FSx for Lustre. One or multiple S3 buckets can be linked to a FSx for Lustre service.
Once the linkage is in place, FSx for Lustre transparently presents S3 objects as files and allows users of FSx for Lustre to write results back to S3. The linkage is done in mutual directions. The linked FSx for Lustre file system is automatically updated as objects are added to, changed in, or deleted from the S3 bucket(s). FSx for Lustre also automatically tracks file system changes and keeps the linked S3 bucket updated as files are added, modified, or deleted. The technologies and architecture of parallel data-transfer that Lustre is renowned for enables fast data transfer between FSx for Lustre and S3.
(Diagram from AWS Website)
Neither AWS Storage Gateway File Gateway nor Amazon FSx for Lustre has to have S3 as the backend. File Gateway can also use Amazon FSx for Windows File Server as the backend, while Amazon FSx for Lustre does not have to have a linked storage. But in common use scenarios that do involve Amazon S3, both of them can be regarded as a file system that fronts the S3 storage – with significant differences. The table below shows some comparisons:
As we can see, FSx for Lustre scores really strong points on throughputs and low latency. On the other hand, Storage Gateway File Gateway supports the extremely widely used NFS and SMB protocols.
Both AWS Storage Gateway File Gateway with S3 as backend and Amazon FSx for Lustre with linked S3 storage are common storage service setups. Their different characteristics should be used to decide when and in what use cases they can be considered in cloud solution development.
-- Simon Wang
Comments
Post a Comment