GlusterFS is an open source, distributed file system capable of scaling to several petabytes and handling thousands of clients. It is a file system with a modular, stackable design, and a unique no-metadata server architecture. This no-metadata server architecture ensures better performance, linear scalability, and reliability. GlusterFS can be flexibly combined with commodity physical, virtual, and cloud resources to deliver highly available and performant enterprise storage at a fraction of the cost of traditional solutions.
GlusterFS clusters together storage building blocks over Infiniband RDMA and/or TCP/IP interconnect, aggregating disk and memory resources and managing data in a single global namespace.
GlusterFS aggregates various storage servers over network interconnects into one large parallel network file system. Based on a stackable user space design, it delivers exceptional performance for diverse workloads and is a key building block of GlusterFS. The POSIX compatible GlusterFS servers, use any ondisk file system which supports extended attributes (eg: ext4, XFS, etc) to format to store data on disks, can be accessed using industry-standard access protocols including Network File System (NFS) and Server Message Block (SMB).
1 Gluster vs Ceph
There are fundamental differences in approach between Ceph and Gluster. Ceph is at base an object-store system, called RADOS, with a set of gateway APIs that present the data in block, file, and object modes. The topology of a Ceph cluster is designed around replication and information distribution, which are intrinsic and provide data integrity.
Red Hat describes Gluster as a scale-out NAS and object store. It uses a hashing algorithm to place data within the storage pool, much as Ceph does. This is the key to scaling in both cases. The hashing algorithm is exported to all the servers, allowing them to figure out where a particular data item should be kept. As a result, data can be replicated easily, and the lack of central metadata files means that there is no bottleneck in accessing, as might occur with Hadoop.
Ceph and Gluster have similar data distribution capabilities. Ceph stripes data across large node-sets, like most object storage software. This aims to prevent bottlenecks in storage accesses.
Because the default block size for Ceph is small (64KB), the data stream fragments into a lot of random IO operations. Disk drives can generally do a maximum number of random IOs per second (typically 150 or less for HDD). Just as important, that number doesn't change much as the size of the transfer increases, so a larger IO size will move more data in aggregate than a small block size.
Gluster uses a default value of 128KB. The larger default size is the primary reason that Red Hat claims to outperform Ceph by three to one in benchmarking tests. The results are an artifice of configuration and setup. The testers could have used a little bit of tuning to bring them close together. Ceph can change chunk size from 64KB to 256KB or even 1MB, and doing so would probably have given Ceph the performance edge.