「Kubernetes」- 将单 Master 升级为多 Master 集群(Single Master to Multiple Master)

  CREATED BY JENKINSBOT

问题描述

早期,我们学艺不精,在开发环境部署 Single Master Kubernetes Cluster 用于日常研发环境。但随着业务扩展,工作负载的增多,此时运行在开发环境的测试服务越来越多(除了业务应用程序,还包括很多基础服务)。

如果 Master Node 故障,比如磁盘损坏,导致其无法恢复,那时我们将只能重建开发环境。各种配置文件、各种资源的恢复,各种服务的重建和部署,将是一场灾难。

现在,我们希望将开发环境的 Single Master Kubernetes Cluster 升级为 Multiple Master Kubernetes Cluster 环境,试图将开发环境转为高可用集群,以防止 Master 故障后集群处于不可用状态。

该笔记将记录:如何将单 Master 集群升级为多 Master 集群,即将集群转换为高可用集群。

解决方案

我们参考 如何将单 master 升级为多 master 集群 文章(backup-article-1697039.png),并结合我们的实际情况进行调整。

需求概述

Single Master Kubernetes Cluster   |   Multiple Master Kubernetes Cluster
--------------------------------------------------------------------------
192.168.10.70 k8s70-cp00           |   192.168.10.70 k8s70 vip
                                   |
                                   |   192.168.10.71 k8s70-cp01
                                   |   192.168.10.72 k8s70-cp02
                                   |   192.168.10.73 k8s70-cp03
                                   |
192.168.10.74 k8s70-wn04           |   192.168.10.74 k8s70-wn04
192.168.10.75 k8s70-wn04           |   192.168.10.75 k8s70-wn04

我们希望:
1)将原始的 Master IP Address 作为 VIP 使用,实现无需调整 Worker 节点及其他引用该集群的服务;
2)并再向集群中添加三个 Master 节点;
3)我们使用 kube-vip 来提供高可用服务;

补充说明

对于 Single Master Kubernetes Cluster 无法直接添加 Master 节点,其会提示:

# kubeadm join 192.168.10.70:6443 --token 2lurqx... --discovery-token-ca-cert-hash sha256:6d4443... --control-plane --certificate-key bd79d2...
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.1. Latest validated version: 19.03
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: 
One or more conditions for hosting a new control plane instance is not satisfied.

unable to add a new control plane instance a cluster that doesn't have a stable controlPlaneEndpoint address

Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.


To see the stack trace of this error execute with --v=5 or higher

过程概述

我们采用更加激进的方案,并且实施过程与原文解决方案存在差异:
1)我们将原来的 Master 提升为高可用节点(假的高可用节点),
2)然后追加三个 Master 节点,
3)再删除原来的 Master 节点,并关机;
4)最后部署 kube-vip 组件。

第一步、转化为高可用

# kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' > kubeadm.yaml

# 并在 apiServer 中,增加 certsSANs 部分
# 仅包含标准 SAN 列表之外的信息,而 kubernetes、kubernetes.default 等等属于标准的 SAN 列表,所以无需手动添加;
# vim kubeadm.yaml
...
controlPlaneEndpoint: 192.168.10.70:6443
apiServer:
  certSANs:
  - 192.168.10.70
  - 192.168.10.71
  - 192.168.10.72
  - 192.168.10.73
...

# mv /etc/kubernetes/pki/apiserver.{crt,key} ~
# kubeadm init phase certs apiserver --config kubeadm.yaml

# kubeadm init phase upload-config kubeadm --config kubeadm.yaml

# kubectl delete pods -n kube-system kube-apiserver-k8s70-cp00

第二步、添加控制平面

需要添加三个控制平面:

# 获取 certificate key 参数
# kubeadm init phase upload-certs --upload-certs
...
<certificate-key>

# 获取加入集群的命令
# kubeadm token create --print-join-command --certificate-key "<certificate-key>"      

# 执行输出的 kubeadm join 命令

第三步、删除控制平面

<k8s70-cp01># kubectl delete nodes k8s70-cp00
<k8s70-cp00># kubeadm reset

<k8s70-cp01># ETCDCTL_API=3 etcdctl                      \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt             \
    --cert=/etc/kubernetes/pki/etcd/server.crt           \
    --key=/etc/kubernetes/pki/etcd/server.key            \
    --endpoints=192.168.10.71:2379                       \
    member list
25f77cad09d74e0f, started, k8s70-cp03, https://192.168.10.73:2380, https://192.168.10.73:2379
38e8e2bfca52e926, started, k8s70-cp00, https://192.168.10.70:2380, https://192.168.10.70:2379
499f0d61ecc30313, started, k8s70-cp02, https://192.168.10.72:2380, https://192.168.10.72:2379
6cb2528bfc0445cd, started, k8s70-cp01, https://192.168.10.71:2380, https://192.168.10.71:2379

<k8s70-cp01># ETCDCTL_API=3 etcdctl                      \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt             \
    --cert=/etc/kubernetes/pki/etcd/server.crt           \
    --key=/etc/kubernetes/pki/etcd/server.key            \
    --endpoints=192.168.10.71:2379                       \
    member remove 38e8e2bfca52e926
Member 38e8e2bfca52e926 removed from cluster c9249fec0ab061ea

第四步、部署高可用性

需要在三个节点中执行:

export VIP=192.168.10.70
export INTERFACE=eth0
export KVVERSION=v0.4.0

alias kube-vip="docker run --network host --rm ghcr.io/kube-vip/kube-vip:$KVVERSION"

kube-vip manifest pod               \
    --interface $INTERFACE          \
    --address $VIP                  \
    --controlplane                  \
    --services                      \
    --arp                           \
    --leaderElection | tee /etc/kubernetes/manifests/kube-vip.yaml

补充说明

kubeadm.yaml

高可用集群的 kubeadm.yaml 配置信息:

# kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' 
apiServer:
  extraArgs:
    authorization-mode: Node,RBAC
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 192.168.10.70:6443
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.20.15
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}

etcd.yaml

# cat /etc/kubernetes/manifests/etcd.yaml 
...
    - --advertise-client-urls=https://192.168.10.71:2379
...
    - --initial-cluster=k8s70-cp01=https://192.168.10.71:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.10.71:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://192.168.10.71:2380
...

每个新加入集群的节点,其 etcd 的 –initial-cluster 也不相同,通常是「累加之前节点的 etcd 地址」;

但是,当 kubeadm reset 时,并没有从 etcd cluster 中删除自己。

参考文献

如何将单 master 升级为多 master 集群 – 云+社区 – 腾讯云