「Rook-Ceph」- 常见问题汇总

  CREATED BY JENKINSBOT

问题描述

该笔记将记录:与 Rook-Ceph 有关的问题,以及常见问题的解决办法;

解决方案

常见问题,参考 Rook Ceph Documentation/Troubleshooting 文档;

cephosd: skipping device “xxx” because it contains a filesystem “ceph_bluestore”

OSD and MON memory consumption · Issue #5811 · rook/rook · GitHub
Ceph Common Issues – Rook Ceph Documentation

问题描述

磁盘无法无法加载成为 OSD,并且提示如下错误信息:

cephosd: skipping device "sdb" because it contains a filesystem "ceph_bluestore"

原因分析

通过对 rook-ceph-osd-prepare-xxx Pod 日志的观察,我们发现 sdb 磁盘已经成为 ceph_bluestore,即 ceph 已经进行处理;
然后在进一步观察时我们发现,rook-ceph-osd-prepare-xxx,在执行的过程中出现 OOMKilled 错误信息;

解决方案

1)修改 helm charts 里的 limit 限制,增加大 10 倍资源,而 request 保留不动;
2)然后,参照 Cleanup 文档,对磁盘进行重置;
3)最后,重新启动 Operator 服务,以探测磁盘:kubectl -n rook-ceph delete pod -l app=rook-ceph-operator

unable to list block devices from: /dev/mapper

[ceph_volume.util.disk][ERROR ] unable to list block devices from: /dev/mapper

timeout expired waiting for volumes to attach or mount for pod “xxxxxxxxx”

Unable to mount volumes for pod "kube-registry-646bc578d9-vwdfd_rook-ceph(4877e2
f4-ea8c-11e9-b6c3-005056814b85)": timeout expired waiting for volumes to attach
or mount for pod "rook-ceph"/"kube-registry-646bc578d9-vwdfd". list of unmounted
volumes=[image-store]. list of unattached volumes=[image-store default-token-dnwrv]

[errno 110] error connecting to the cluster

在 Rook-Ceph 中,当执行 ceph status 命令时,命令挂起,在一段时间之后,产生如下错误:

[errno 110] error connecting to the cluster