问题描述
早期,我们学艺不精,在开发环境部署 Single Master Kubernetes Cluster 用于日常研发环境。但随着业务扩展,工作负载的增多,此时运行在开发环境的测试服务越来越多(除了业务应用程序,还包括很多基础服务)。
如果 Master Node 故障,比如磁盘损坏,导致其无法恢复,那时我们将只能重建开发环境。各种配置文件、各种资源的恢复,各种服务的重建和部署,将是一场灾难。
现在,我们希望将开发环境的 Single Master Kubernetes Cluster 升级为 Multiple Master Kubernetes Cluster 环境,试图将开发环境转为高可用集群,以防止 Master 故障后集群处于不可用状态。
该笔记将记录:如何将单 Master 集群升级为多 Master 集群,即将集群转换为高可用集群。
解决方案
我们参考 如何将单 master 升级为多 master 集群 文章(backup-article-1697039.png),并结合我们的实际情况进行调整。
需求概述
Single Master Kubernetes Cluster | Multiple Master Kubernetes Cluster -------------------------------------------------------------------------- 192.168.10.70 k8s70-cp00 | 192.168.10.70 k8s70 vip | | 192.168.10.71 k8s70-cp01 | 192.168.10.72 k8s70-cp02 | 192.168.10.73 k8s70-cp03 | 192.168.10.74 k8s70-wn04 | 192.168.10.74 k8s70-wn04 192.168.10.75 k8s70-wn04 | 192.168.10.75 k8s70-wn04
我们希望:
1)将原始的 Master IP Address 作为 VIP 使用,实现无需调整 Worker 节点及其他引用该集群的服务;
2)并再向集群中添加三个 Master 节点;
3)我们使用 kube-vip 来提供高可用服务;
补充说明
对于 Single Master Kubernetes Cluster 无法直接添加 Master 节点,其会提示:
# kubeadm join 192.168.10.70:6443 --token 2lurqx... --discovery-token-ca-cert-hash sha256:6d4443... --control-plane --certificate-key bd79d2... [preflight] Running pre-flight checks [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.1. Latest validated version: 19.03 [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' error execution phase preflight: One or more conditions for hosting a new control plane instance is not satisfied. unable to add a new control plane instance a cluster that doesn't have a stable controlPlaneEndpoint address Please ensure that: * The cluster has a stable controlPlaneEndpoint address. * The certificates that must be shared among control plane instances are provided. To see the stack trace of this error execute with --v=5 or higher
过程概述
我们采用更加激进的方案,并且实施过程与原文解决方案存在差异:
1)我们将原来的 Master 提升为高可用节点(假的高可用节点),
2)然后追加三个 Master 节点,
3)再删除原来的 Master 节点,并关机;
4)最后部署 kube-vip 组件。
第一步、转化为高可用
# kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' > kubeadm.yaml # 并在 apiServer 中,增加 certsSANs 部分 # 仅包含标准 SAN 列表之外的信息,而 kubernetes、kubernetes.default 等等属于标准的 SAN 列表,所以无需手动添加; # vim kubeadm.yaml ... controlPlaneEndpoint: 192.168.10.70:6443 apiServer: certSANs: - 192.168.10.70 - 192.168.10.71 - 192.168.10.72 - 192.168.10.73 ... # mv /etc/kubernetes/pki/apiserver.{crt,key} ~ # kubeadm init phase certs apiserver --config kubeadm.yaml # kubeadm init phase upload-config kubeadm --config kubeadm.yaml # kubectl delete pods -n kube-system kube-apiserver-k8s70-cp00
第二步、添加控制平面
需要添加三个控制平面:
# 获取 certificate key 参数 # kubeadm init phase upload-certs --upload-certs ... <certificate-key> # 获取加入集群的命令 # kubeadm token create --print-join-command --certificate-key "<certificate-key>" # 执行输出的 kubeadm join 命令
第三步、删除控制平面
<k8s70-cp01># kubectl delete nodes k8s70-cp00 <k8s70-cp00># kubeadm reset <k8s70-cp01># ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ --endpoints=192.168.10.71:2379 \ member list 25f77cad09d74e0f, started, k8s70-cp03, https://192.168.10.73:2380, https://192.168.10.73:2379 38e8e2bfca52e926, started, k8s70-cp00, https://192.168.10.70:2380, https://192.168.10.70:2379 499f0d61ecc30313, started, k8s70-cp02, https://192.168.10.72:2380, https://192.168.10.72:2379 6cb2528bfc0445cd, started, k8s70-cp01, https://192.168.10.71:2380, https://192.168.10.71:2379 <k8s70-cp01># ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ --endpoints=192.168.10.71:2379 \ member remove 38e8e2bfca52e926 Member 38e8e2bfca52e926 removed from cluster c9249fec0ab061ea
第四步、部署高可用性
需要在三个节点中执行:
export VIP=192.168.10.70 export INTERFACE=eth0 export KVVERSION=v0.4.0 alias kube-vip="docker run --network host --rm ghcr.io/kube-vip/kube-vip:$KVVERSION" kube-vip manifest pod \ --interface $INTERFACE \ --address $VIP \ --controlplane \ --services \ --arp \ --leaderElection | tee /etc/kubernetes/manifests/kube-vip.yaml
补充说明
kubeadm.yaml
高可用集群的 kubeadm.yaml 配置信息:
# kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' apiServer: extraArgs: authorization-mode: Node,RBAC timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta2 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controlPlaneEndpoint: 192.168.10.70:6443 controllerManager: {} dns: type: CoreDNS etcd: local: dataDir: /var/lib/etcd imageRepository: registry.aliyuncs.com/google_containers kind: ClusterConfiguration kubernetesVersion: v1.20.15 networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 scheduler: {}
etcd.yaml
# cat /etc/kubernetes/manifests/etcd.yaml ... - --advertise-client-urls=https://192.168.10.71:2379 ... - --initial-cluster=k8s70-cp01=https://192.168.10.71:2380 - --key-file=/etc/kubernetes/pki/etcd/server.key - --listen-client-urls=https://127.0.0.1:2379,https://192.168.10.71:2379 - --listen-metrics-urls=http://127.0.0.1:2381 - --listen-peer-urls=https://192.168.10.71:2380 ...
每个新加入集群的节点,其 etcd 的 –initial-cluster 也不相同,通常是「累加之前节点的 etcd 地址」;
但是,当 kubeadm reset 时,并没有从 etcd cluster 中删除自己。
参考文献
如何将单 master 升级为多 master 集群 – 云+社区 – 腾讯云