主页 > 人工智能  > 

基于Rook的Ceph云原生存储部署与实践指南(下)

基于Rook的Ceph云原生存储部署与实践指南(下)

#作者:任少近

文章目录 6Ceph资源对像管理6.1查看services6.2查看Jobs6.3 查看deployments.apps6.4查看daemonsets.apps6.5查看configmaps6.6查看clusterroles.rbac.authorization.k8s.io6.7查看clusterrolebindings.rbac.authorization.k8s.io6.8通过cephclusters.ceph查看OSD池信息 7访问ceph7.1Toolbox客户端7.1K8s节点访问ceph7.2暴露端口web访问7.3删除OSD Deployment7.4Ceph的Pool(多租户)创建pool设置pg的数量7.5修改登录密码 8安装错误汇总8.1quincy版的Ceph-coomon安装报错 9故障处理9.1ceph集群提示daemons have recently crashed, health: HEALTH_WARN9.2osd down

6Ceph资源对像管理 6.1查看services [root@k8s-master ~]# kubectl -n rook-ceph get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rook-ceph-mgr ClusterIP 10.110.141.201 <none> 9283/TCP 13h rook-ceph-mgr-dashboard ClusterIP 10.103.197.146 <none> 8443/TCP 13h rook-ceph-mon-a ClusterIP 10.110.163.61 <none> 6789/TCP,3300/TCP 13h rook-ceph-mon-b ClusterIP 10.100.49.10 <none> 6789/TCP,3300/TCP 13h rook-ceph-mon-c ClusterIP 10.96.193.162 <none> 6789/TCP,3300/TCP 13h 6.2查看Jobs [root@k8s-master]#kubectl -n rook-ceph get jobs NAME COMPLETIONS DURATION AGE rook-ceph-osd-prepare-k8s-master 1/1 6s 11h rook-ceph-osd-prepare-k8s-node1 1/1 7s 11h rook-ceph-osd-prepare-k8s-node2 1/1 7s 11h rook-ceph-osd-prepare-k8s-node3 1/1 6s 11h 6.3 查看deployments.apps [root@k8s-master]# kubectl -n rook-ceph get deployments.apps NAME READY UP-TO-DATE AVAILABLE AGE csi-cephfsplugin-provisioner 2/2 2 2 12h csi-rbdplugin-provisioner 2/2 2 2 12h rook-ceph-crashcollector-k8s-master 1/1 1 1 12h rook-ceph-crashcollector-k8s-node1 1/1 1 1 12h rook-ceph-crashcollector-k8s-node2 1/1 1 1 12h rook-ceph-crashcollector-k8s-node3 1/1 1 1 12h rook-ceph-mgr-a 1/1 1 1 12h rook-ceph-mgr-b 1/1 1 1 12h rook-ceph-mon-a 1/1 1 1 12h rook-ceph-mon-b 1/1 1 1 12h rook-ceph-mon-c 1/1 1 1 12h rook-ceph-operator 1/1 1 1 12h rook-ceph-osd-0 1/1 1 1 12h rook-ceph-osd-1 1/1 1 1 12h rook-ceph-osd-2 1/1 1 1 12h rook-ceph-osd-3 1/1 1 1 12h 6.4查看daemonsets.apps [root@k8s-master]# kubectl -n rook-ceph get daemonsets.apps NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE csi-cephfsplugin 4 4 4 4 4 <none> 12h csi-rbdplugin 4 4 4 4 4 <none> 12h 6.5查看configmaps [root@k8s-master]# kubectl -n rook-ceph get configmaps NAME DATA AGE kube-root-ca.crt 1 13h rook-ceph-csi-config 1 12h rook-ceph-csi-mapping-config 1 12h rook-ceph-mon-endpoints 5 12h rook-ceph-operator-config 33 13h rook-config-override 1 12h 6.6查看clusterroles.rbac.authorization.k8s.io [root@k8s-master # kubectl -n rook-ceph get clusterroles.rbac.authorization.k8s.io NAME CREATED AT cephfs-csi-nodeplugin 2023-06-13T13:56:29Z cephfs-external-provisioner-runner 2023-06-13T13:56:29Z rbd-csi-nodeplugin 2023-06-13T13:56:29Z rbd-external-provisioner-runner 2023-06-13T13:56:29Z rook-ceph-cluster-mgmt 2023-06-13T13:56:29Z rook-ceph-global 2023-06-13T13:56:29Z rook-ceph-mgr-cluster 2023-06-13T13:56:29Z rook-ceph-mgr-system 2023-06-13T13:56:29Z rook-ceph-object-bucket 2023-06-13T13:56:29Z rook-ceph-osd 2023-06-13T13:56:29Z rook-ceph-system 2023-06-13T13:56:29Z 6.7查看clusterrolebindings.rbac.authorization.k8s.io kubectl -n rook-ceph get clusterrolebindings.rbac.authorization.k8s.io cephfs-csi-nodeplugin-role ClusterRole/cephfs-csi-nodeplugin cephfs-csi-provisioner-role ClusterRole/cephfs-external-provisioner-runner rbd-csi-nodeplugin ClusterRole/rbd-csi-nodeplugin rbd-csi-provisioner-role ClusterRole/rbd-external-provisioner-runner rook-ceph-global ClusterRole/rook-ceph-global rook-ceph-mgr-cluster ClusterRole/rook-ceph-mgr-cluster rook-ceph-object-bucket ClusterRole/rook-ceph-object-bucket rook-ceph-osd ClusterRole/rook-ceph-osd rook-ceph-system ClusterRole/rook-ceph-system 6.8通过cephclusters.ceph查看OSD池信息

如果你使用了 Rook Ceph Operator 来管理 Ceph 集群,还可以查看 Rook 中的自定义资源来获取 OSD 池的信息 [root@k8s-master ~]# kubectl get cephclusters.ceph.rook.io rook-ceph -o yaml

7访问ceph 7.1Toolbox客户端

部署 cd rook/deploy/examples/ kubectl apply -f toolbox.yaml

连接ceph 集群

[root@k8s-master ~]# kubectl -n rook-ceph exec -it rook-ceph-tools-7857bc9568-q9fjk /bin/bash bash-4.4$ ceph -s cluster: id: e320aa6c-0057-46ad-b2bf-5c49df8eba5a health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 13h) mgr: b(active, since 13h), standbys: a osd: 4 osds: 4 up (since 13h), 4 in (since 13h) data: pools: 1 pools, 1 pgs objects: 2 objects, 449 KiB usage: 45 MiB used, 200 GiB / 200 GiB avail pgs: 1 active+clean 7.1K8s节点访问ceph

在节点添加ceph.conf keyring

[root@k8s-master]#mkdir /etc/ceph [root@k8s-master]#cd /etc/ceph [root@k8s-master]#vi ceph.conf [global] mon_host = 10.110.163.61:6789,10.100.49.10:6789,10.96.193.162:6789 [client.admin] keyring = /etc/ceph/keyring [root@k8s-master]#vi keyring [client.admin] key = AQCGfYhkeMnEFRAAJnW4jUMwmJz2b1dPvdTOJg==

验证 telnet 10.110.163.61 6789 以上三个services地址任一个

添加yum源

[ceph] name=ceph baseurl= mirrors.aliyun /ceph/rpm-quincy/el8/x86_64/ enabled=1 gpgcheck=0

安装ceph-common (安装失败,详情见5.1) [root@k8s-master]#yum install -y ceph-common

成功可在节点上直接操作如下:

7.2暴露端口web访问

执行rook/deploy/examples/dashboard-external-https.yaml

[root@k8s-master examples]#kubectl apply -f rook/deploy/examples/dashboard-external-https.yaml rook-ceph-mgr-dashboard-external-https NodePort 10.106.127.224 <none> 8443:31555/TCP

获取密码:

kubectl -n rook-ceph get secrets rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 --decode > rook-ceph-dashboard-password.password G+LIkJwXQ/E*>/P&DbzB

访问,用户名为admin 192.168.123.194:31555/

7.3删除OSD Deployment

如果cluster.yaml中removeOSDsIfOutAndSafeToRemove: true设置为true,则Rook Operator将自动清除Deployment。默认为false。

7.4Ceph的Pool(多租户)创建pool设置pg的数量

以pool为颗粒度,如果不创建/指定,则数据会存放在默认的pool里。创建pool需要设置pg的数量,一般来说每个OSD为100个PG,也可以按照如下规则配置:

若少于5个OSD, 设置pg_num为128。 5~10个OSD,设置pg_num为512。 10~50个OSD,设置pg_num为4096。 超过50个OSD,可以参考pgcalc计算。 Pool上还需要设置CRUSH Rules策略,这是data如何分布式存储的策略。

此外,针对pool,还可以调整POOL副本数量、删除POOL、设置POOL配额、重命名POOL、查看POOL状态信息。

7.5修改登录密码

登录kubectl exec -it rook-ceph-tools-7857bc9568-q9fjk bash bash-4.4$ echo -n ‘1qaz@WSX’ > /tmp/password.txt bash-4.4$ ceph dashboard ac-user-set-password admin --force-password -i /tmp/password.txt,以新密码登录。 以admin/1qaz@WSX为用户名密码登录。

8安装错误汇总 8.1quincy版的Ceph-coomon安装报错

原因:aliyuncs上无el7版本的quincy依赖包,只有el8依赖包有quincy版,尝试octopus版同样报错。

--> Finished Dependency Resolution Error: Package: 2:libcephfs2-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(GLIBCXX_3.4.21)(64bit) Error: Package: 2:ceph-common-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(CXXABI_1.3.11)(64bit) Error: Package: 2:libcephfs2-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit) Error: Package: 2:ceph-common-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(GLIBCXX_3.4.22)(64bit) 。。。。。 。。。。。 Requires: libstdc++.so.6(GLIBCXX_3.4.20)(64bit) Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph) Requires: libicuuc.so.60()(64bit) Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(GLIBCXX_3.4.21)(64bit) Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph) Requires: libstdc++.so.6(CXXABI_1.3.11)(64bit) Error: Package: 2:librgw2-17.2.6-0.el8.x86_64 (ceph) Requires: libthrift-0.13.0.so()(64bit) You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest 9故障处理 9.1ceph集群提示daemons have recently crashed, health: HEALTH_WARN bash-4.4$ ceph status cluster: id: e320aa6c-0057-46ad-b2bf-5c49df8eba5a health: HEALTH_WARN 3 mgr modules have recently crashed services: mon: 3 daemons, quorum a,b,c (age 23h) mgr: b(active, since 23h), standbys: a osd: 4 osds: 4 up (since 23h), 4 in (since 23h) data: pools: 1 pools, 1 pgs objects: 2 objects, 449 KiB usage: 45 MiB used, 200 GiB / 200 GiB avail pgs: 1 active+clean #查看详细日志信息 bash-4.4$ ceph health detail HEALTH_WARN 3 mgr modules have recently crashed [WRN] RECENT_MGR_MODULE_CRASH: 3 mgr modules have recently crashed

ceph 的 crash模块用来收集守护进程出现 crashdumps (崩溃)的信息,并将其存储在ceph集群中,以供以后分析。

crash查看一下

bash-4.4$ ceph crash ls ID ENTITY NEW 2023-06-14T13:56:38.064890Z_75a59d8c-9c99-47af-8cef-e632d8f0a010 mgr.b * 2023-06-14T13:56:53.252095Z_bc44e5d3-67e5-4c22-a872-e9c7f9799f55 mgr.b * 2023-06-14T13:57:38.564803Z_1f132169-793b-4ac6-a3c7-af48c91f5365 mgr.b * #带*号表示为最新,上面说mgr和osd有异常信息,接下来排查下osd和mgr,看看是不是因为没有归档的原因造成 bash-4.4$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.19519 root default -5 0.04880 host k8s-master 0 ssd 0.04880 osd.0 up 1.00000 1.00000 -3 0.04880 host k8s-node1 1 ssd 0.04880 osd.1 up 1.00000 1.00000 -9 0.04880 host k8s-node2 3 ssd 0.04880 osd.3 up 1.00000 1.00000 -7 0.04880 host k8s-node3 2 ssd 0.04880 osd.2 up 1.00000 1.00000

通过上面的命令,排查到集群状态是ok,判断crash没有归档,造成误报,接下来进行归档

#第一种方法,适合只有一两个没有归档的 #ceph crash ls #ceph crash archive <id> #第二种方法,适合多个归档异常的,我们这边直接执行下面的命令 #ceph crash archive-all

9.2osd down

直接看挂了,看日志。 通过ceph osd tree,发现osd.3 down了。在k8s-node2上。

标签:

基于Rook的Ceph云原生存储部署与实践指南(下)由讯客互联人工智能栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“基于Rook的Ceph云原生存储部署与实践指南(下)