CRUSH algorithm
The CRUSH algorithm enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically.
Using the CRUSH algorithm:Ceph calculates which placement group should contain the object, and further calculates which Ceph OSD Daemon should store the placement group.
它保存的信息包括:集群设备列表、bucket 列表、故障域(failure domain)分层结构、保存数据时用到的为故障域定义的规则(rules)等;
# ceph osd crush dump
CRUSH Map 的内容
其包含很多内容,主要为如下四部分:Device、Bucket Type、Bucket、Rule;
Device(设备)
集群的所有设备列表,其位于 CRUSH Map 的开头部分:
# devices device 0 osd.0 class ssd device 1 osd.1 class ssd device 2 osd.2 class ssd device 3 osd.3 class ssd ...
要将 PG 映射到 OSD,CRUSH 需要 OSD 设备列表;
Bucket Type
... # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 zone type 10 region type 11 root ...
Bucket
host laptop-asus-k53sd { id -7 # do not change unnecessarily id -8 class ssd # do not change unnecessarily # weight 1.819 alg straw2 hash 0 # rjenkins1 item osd.3 weight 1.819 } host pc-amd64-100249 { ... } host pc-amd64-100254 { ... } root default { id -1 # do not change unnecessarily id -4 class ssd # do not change unnecessarily # weight 2.529 alg straw2 hash 0 # rjenkins1 item pc-amd64-100254 weight 0.300 item pc-amd64-100249 weight 0.409 item laptop-asus-k53sd weight 1.819 } ...
Rules
... rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } ...
各部分间的关系
Device、Bucket Type、Bucket,这三者用以描述存储设备的结构。其为树形结构,使用节点与叶子两种层次。
Rule,通过引用 Device、Bucket Type、Bucket 来控制存储池内数据的存放方式:复制、放置、属性;