Ceph is a free software storage platform designed to present object, block, and file system with storage from a single distributed computer cluster. and provides interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available.

1 Ceph

Ceph replicates data and makes it fault-tolerant, using commodity hardware and requiring no specific hardware support. As a result of its design, the system is both self-healing and self-managing, aiming to minimize administration time and other costs.

On April 21, 2016, the Ceph development team released "Jewel", the first Ceph release in which CephFS is considered stable. The CephFS repair and disaster recovery tools are feature-complete (snapshots, multiple active metadata servers and some other functionality is disabled by default).

Ceph uses the CRUSH algorithm to determine how to store and retrieve data by computing data storage locations. CRUSH empowers Ceph clients to communicate with OSDs directly rather than through a centralized server or broker. With an algorithmically determined method of storing and retrieving data, Ceph avoids a single point of failure, a performance bottleneck, and a physical limit to its scalability.

1.1 CRUSH

CRUSH (Controlled Replication Under Scalable Hashing) is a hash-based algorithm for calculating how and where to store and retrieve data in a distributed object–based storage cluster. CRUSH is

CRUSH distributes data evenly across available object storage devices in what is often described as a pseudo-random manner. Distribution is controlled by a hierarchical cluster map called a CRUSH map. The map, which can be customized by the storage administrator, informs the cluster about the layout and capacity of nodes in the storage network and specifies how redundancy should be managed. By allowing cluster nodes to calculate where a data item has been stored, CRUSH avoids the need to look up data locations in a central directory. CRUSH also allows for nodes to be added or removed, moving as few objects as possible while still maintaining balance across the new cluster configuration.

1.1.1 Libcrunch

Libcrunch is a lightweight mapping framework that maps data objects to a number of nodes, subject to user-specified constraints developed by Twitter.

The libcrunch implementation was heavily inspired by the paper on the CRUSH algorithm. It's main features are:

  • flexible cluster topology definition
  • define your placement rules
  • supports replication factor (RF) and replica distribution factor (RDF)
  • balanced distribution of data that reflects weights
  • stability against topology changes
  • supports target balancing

1.1.2 CRUSH map

CRUSH runs quick calculation and assign object location directory without lookup.

CRUSH is Pseudo-Random Placement Algorithm, find the object "in the fly w/o metadata indexing". It is "repeatable" and "deterministic".

RUSH includes Rule-Base Configuration such as infrastructure topology aware (region, site ... etc), adjustable replication (change replica policy) and Weighting ( prioritize base on weight for parameters).