「Kubernetes」- 创建 Pod 处于 ContainerCreating 状态

  CREATED BY JENKINSBOT

[Events] x/x k8s nodes are available: x Insufficient pods

Configure maximum Pods per node  |  Google Kubernetes Engine (GKE)

问题描述

Pod 出于 Pending 状态,kubectl describe 提示:

...
Events:
  Type    Reason     Age        From               Message
  ----    ------     ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 Insufficient pods.

原因分析

单节点容纳 Pod 数量有限,默认最多 110 个 Pod 实例:

# kubectl describe nodes k8s120-wn100
...
Capacity:
  cpu:                8
  ephemeral-storage:  9974088Ki
  hugepages-2Mi:      0
  memory:             16392456Ki
  pods:               110
Allocatable:
  cpu:                8
  ephemeral-storage:  9192119486
  hugepages-2Mi:      0
  memory:             16290056Ki
  pods:               110
...

解决方案

通过为 kubelet 指定 –max-pods <num> 来控制单节点 Pod 最大数量;

ContainerCreating

创建 POD 实例一直处于 ContainerCreating 状态;

然后我们搜索到「Pod 异常排错」一文。关键内容如下:

可以发现,该 Pod 的 Sandbox 容器无法正常启动,具体原因需要查看 Kubelet 日志:
发现是 cni0 网桥配置了一个不同网段的 IP 地址导致,删除该网桥(网络插件会自动重新创建)即可修复

除了以上错误,其他可能的原因还有

	镜像拉取失败,比如
		配置了错误的镜像
		Kubelet 无法访问镜像(国内环境访问 gcr.io 需要特殊处理)
		私有镜像的密钥配置错误
		镜像太大,拉取超时(可以适当调整 kubelet 的 --image-pull-progress-deadline 和 --runtime-request-timeout 选项)
	CNI 网络错误,一般需要检查 CNI 网络插件的配置,比如
		无法配置 Pod 网络
		无法分配 IP 地址
	容器无法启动,需要检查是否打包了正确的镜像或者是否配置了正确的容器参数

然后查看 kubelet 日志:

# journalctl -f -u kubelet.service | grep -i error -C 500 # 为了用红色标记 Error 字体,易于识别
-- Logs begin at Wed 2019-12-04 01:04:12 CST. --
Dec 04 12:05:41 k8s-master2 kubelet[27615]: E1204 12:05:41.726630   27615 kuberuntime_manager.go:605] killPodWithSyncResult failed: failed to "KillPodSandbox" for "c123d775-1646-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout"
Dec 04 12:05:41 k8s-master2 kubelet[27615]: E1204 12:05:41.726884   27615 pod_workers.go:186] Error syncing pod c123d775-1646-11ea-b2b2-005056814b85 ("kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system(c123d775-1646-11ea-b2b2-005056814b85)"), skipping: failed to "KillPodSandbox" for "c123d775-1646-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dialtcp 10.96.0.1:443: i/o timeout"
Dec 04 12:05:42 k8s-master2 kubelet[27615]: W1204 12:05:42.544445   27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9"
Dec 04 12:05:42 k8s-master2 kubelet[27615]: 2019-12-04 12:05:42.664 [INFO][15340] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kubernetes-dashboard-7cbc7c7975-b2d4r;K8S_POD_INFRA_CONTAINER_ID=a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig]
Dec 04 12:05:44 k8s-master2 kubelet[27615]: 2019-12-04 12:05:44.808 [INFO][15145] customresource.go 217: Error getting resource Key=ClusterInformation(default) Name="default" Resource="ClusterInformations" Revision="" error=Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:05:44 k8s-master2 kubelet[27615]: E1204 12:05:44.810985   27615 cni.go:330] Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:05:44 k8s-master2 kubelet[27615]: E1204 12:05:44.813039   27615 remote_runtime.go:119] StopPodSandbox "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-567578c766-hk88x_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:05:44 k8s-master2 kubelet[27615]: E1204 12:05:44.813149   27615 kuberuntime_manager.go:810] Failed to stop sandbox {"docker" "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3"}
Dec 04 12:05:44 k8s-master2 kubelet[27615]: E1204 12:05:44.813248   27615 kuberuntime_manager.go:605] killPodWithSyncResult failed: failed to "KillPodSandbox" for "40da0456-15f0-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"coredns-567578c766-hk88x_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout"
Dec 04 12:05:44 k8s-master2 kubelet[27615]: E1204 12:05:44.813311   27615 pod_workers.go:186] Error syncing pod 40da0456-15f0-11ea-b2b2-005056814b85 ("coredns-567578c766-hk88x_kube-system(40da0456-15f0-11ea-b2b2-005056814b85)"), skipping: failed to "KillPodSandbox" for "40da0456-15f0-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"coredns-567578c766-hk88x_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout"
Dec 04 12:05:45 k8s-master2 kubelet[27615]: W1204 12:05:45.641059   27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3"
Dec 04 12:05:45 k8s-master2 kubelet[27615]: 2019-12-04 12:05:45.722 [INFO][15369] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=coredns-567578c766-hk88x;K8S_POD_INFRA_CONTAINER_ID=29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig]
Dec 04 12:05:54 k8s-master2 kubelet[27615]: E1204 12:05:54.017860   27615 pod_workers.go:186] Error syncing pod f9cae75f-1648-11ea-b2b2-005056814b85 ("calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)"), skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)"
Dec 04 12:05:57 k8s-master2 kubelet[27615]: 2019-12-04 12:05:57.470 [INFO][15222] customresource.go 217: Error getting resource Key=ClusterInformation(default) Name="default" Resource="ClusterInformations" Revision="" error=Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:05:57 k8s-master2 kubelet[27615]: E1204 12:05:57.473652   27615 cni.go:330] Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:05:57 k8s-master2 kubelet[27615]: E1204 12:05:57.474915   27615 remote_runtime.go:119] StopPodSandbox "02709b1f4b280bc4eb167115b84d0eb320746cc25a40c8cbb2ff40149d99346d" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-686495bd6c-qnwvk_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:05:57 k8s-master2 kubelet[27615]: E1204 12:05:57.474969   27615 kuberuntime_gc.go:153] Failed to stop sandbox "02709b1f4b280bc4eb167115b84d0eb320746cc25a40c8cbb2ff40149d99346d" before removing:rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-686495bd6c-qnwvk_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:05:57 k8s-master2 kubelet[27615]: W1204 12:05:57.479126   27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "830d31b4cf0b44ae761fadaff787a8ccbf0de22784caa48ab9cf08095a84e1a6"
Dec 04 12:05:57 k8s-master2 kubelet[27615]: 2019-12-04 12:05:57.556 [INFO][15494] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=830d31b4cf0b44ae761fadaff787a8ccbf0de22784caa48ab9cf08095a84e1a6 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=coredns-686495bd6c-zzg9p;K8S_POD_INFRA_CONTAINER_ID=830d31b4cf0b44ae761fadaff787a8ccbf0de22784caa48ab9cf08095a84e1a6 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig]
Dec 04 12:06:08 k8s-master2 kubelet[27615]: E1204 12:06:08.017672   27615 pod_workers.go:186] Error syncing pod f9cae75f-1648-11ea-b2b2-005056814b85 ("calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)"), skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)"
Dec 04 12:06:12 k8s-master2 kubelet[27615]: 2019-12-04 12:06:12.673 [INFO][15340] customresource.go 217: Error getting resource Key=ClusterInformation(default) Name="default" Resource="ClusterInformations" Revision="" error=Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:06:12 k8s-master2 kubelet[27615]: E1204 12:06:12.676159   27615 cni.go:330] Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:06:12 k8s-master2 kubelet[27615]: E1204 12:06:12.677426   27615 remote_runtime.go:119] StopPodSandbox "a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:06:12 k8s-master2 kubelet[27615]: E1204 12:06:12.677515   27615 kuberuntime_manager.go:810] Failed to stop sandbox {"docker" "a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9"}
Dec 04 12:06:12 k8s-master2 kubelet[27615]: E1204 12:06:12.677614   27615 kuberuntime_manager.go:605] killPodWithSyncResult failed: failed to "KillPodSandbox" for "c123d775-1646-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout"
Dec 04 12:06:12 k8s-master2 kubelet[27615]: E1204 12:06:12.677668   27615 pod_workers.go:186] Error syncing pod c123d775-1646-11ea-b2b2-005056814b85 ("kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system(c123d775-1646-11ea-b2b2-005056814b85)"), skipping: failed to "KillPodSandbox" for "c123d775-1646-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dialtcp 10.96.0.1:443: i/o timeout"
Dec 04 12:06:13 k8s-master2 kubelet[27615]: W1204 12:06:13.497768   27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9"
Dec 04 12:06:13 k8s-master2 kubelet[27615]: 2019-12-04 12:06:13.583 [INFO][15614] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kubernetes-dashboard-7cbc7c7975-b2d4r;K8S_POD_INFRA_CONTAINER_ID=a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig]
Dec 04 12:06:15 k8s-master2 kubelet[27615]: 2019-12-04 12:06:15.728 [INFO][15369] customresource.go 217: Error getting resource Key=ClusterInformation(default) Name="default" Resource="ClusterInformations" Revision="" error=Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:06:15 k8s-master2 kubelet[27615]: E1204 12:06:15.731558   27615 cni.go:330] Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:06:15 k8s-master2 kubelet[27615]: E1204 12:06:15.733986   27615 remote_runtime.go:119] StopPodSandbox "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-567578c766-hk88x_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:06:15 k8s-master2 kubelet[27615]: E1204 12:06:15.734094   27615 kuberuntime_manager.go:810] Failed to stop sandbox {"docker" "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3"}
Dec 04 12:06:15 k8s-master2 kubelet[27615]: E1204 12:06:15.734190   27615 kuberuntime_manager.go:605] killPodWithSyncResult failed: failed to "KillPodSandbox" for "40da0456-15f0-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"coredns-567578c766-hk88x_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout"
Dec 04 12:06:15 k8s-master2 kubelet[27615]: E1204 12:06:15.734244   27615 pod_workers.go:186] Error syncing pod 40da0456-15f0-11ea-b2b2-005056814b85 ("coredns-567578c766-hk88x_kube-system(40da0456-15f0-11ea-b2b2-005056814b85)"), skipping: failed to "KillPodSandbox" for "40da0456-15f0-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"coredns-567578c766-hk88x_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout"
Dec 04 12:06:16 k8s-master2 kubelet[27615]: W1204 12:06:16.586533   27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3"
Dec 04 12:06:16 k8s-master2 kubelet[27615]: 2019-12-04 12:06:16.692 [INFO][15641] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=coredns-567578c766-hk88x;K8S_POD_INFRA_CONTAINER_ID=29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig]
Dec 04 12:06:23 k8s-master2 kubelet[27615]: E1204 12:06:23.017810   27615 pod_workers.go:186] Error syncing pod f9cae75f-1648-11ea-b2b2-005056814b85 ("calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)"), skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)"
Dec 04 12:06:27 k8s-master2 kubelet[27615]: 2019-12-04 12:06:27.562 [INFO][15494] customresource.go 217: Error getting resource Key=ClusterInformation(default) Name="default" Resource="ClusterInformations" Revision="" error=Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:06:27 k8s-master2 kubelet[27615]: E1204 12:06:27.565109   27615 cni.go:330] Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:06:27 k8s-master2 kubelet[27615]: E1204 12:06:27.566898   27615 remote_runtime.go:119] StopPodSandbox "830d31b4cf0b44ae761fadaff787a8ccbf0de22784caa48ab9cf08095a84e1a6" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-686495bd6c-zzg9p_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:06:27 k8s-master2 kubelet[27615]: E1204 12:06:27.566959   27615 kuberuntime_gc.go:153] Failed to stop sandbox "830d31b4cf0b44ae761fadaff787a8ccbf0de22784caa48ab9cf08095a84e1a6" before removing:rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-686495bd6c-zzg9p_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
Dec 04 12:06:27 k8s-master2 kubelet[27615]: W1204 12:06:27.572646   27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "4163c0e3d95328d6a9020ebec435942e6891be8b4156967cab119b887bf55d19"
Dec 04 12:06:27 k8s-master2 kubelet[27615]: 2019-12-04 12:06:27.662 [INFO][15714] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=4163c0e3d95328d6a9020ebec435942e6891be8b4156967cab119b887bf55d19 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=coredns-686495bd6c-c7f8d;K8S_POD_INFRA_CONTAINER_ID=4163c0e3d95328d6a9020ebec435942e6891be8b4156967cab119b887bf55d19 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig]

从错误里看多半是因为:

Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout

推测原因是无法访问接口,导致 Calico 无法正常启动;

然后我们就 Google 搜索Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default信息;

然后我们就得到这篇文档「Enabling kube-proxy IPVS mode prevents access to API server via service IP #1461」,然后我们查看发现 kube-proxy 的 ConfigMap.mode 字段是空的(mode: “”)。然后我们查看 kube-proxy 日志,发现他在访问Get https://1.2.3.4:6443地址;

原因分析

是 kube-proxy 配置存在问题:

E1204 05:46:21.533391       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Endpoints: Get https://1.2.3.4:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 1.2.3.4:6443: i/o timeout
E1204 05:46:23.780279       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Service: Get https://1.2.3.4:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 1.2.3.4:6443: i/o timeout
E1204 05:46:52.536354       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Endpoints: Get https://1.2.3.4:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 1.2.3.4:6443: i/o timeout

我们也不知道 ConfigMap 为什么会请求 1.2.3.4 地址,估计是之前升级失败没有完全回滚;

解决方案

修改 kube-proxy 的 ConfigMap 配置(kubectl edit -n kube-system configmaps kube-proxy),将kubeconfig.conf键的clusters.cluster.server修改为 API Server 的地址;

然后,重启 POD 实例:

kubectl delete pod -n kube-system --force --grace-period=0 $( kubectl get pod -n kube-system | grep kube-proxy | awk '{printf "%s ", $1}' )
kubectl delete pod -n kube-system --force --grace-period=0 $( kubectl get pod -n kube-system | grep calico-node | awk '{printf "%s ", $1}' )
kubectl delete pod -n kube-system --force --grace-period=0 $( kubectl get pod -n kube-system | grep coredns | awk '{printf "%s ", $1}' )
kubectl delete pod -n kube-system --force --grace-period=0 $( kubectl get pod -n kube-system | grep kubernetes-dashboard | awk '{printf "%s ", $1}' )

… orphaned pod found …

Orphaned pod found – but volume paths are still present on disk · Issue #60987 · kubernetes/kubernetes

kubelet 提示 orphaned pod found 错误

解决方案:删除 /var/lib/kubelet/pods/<uuid> 目录

… Unable to attach or mount volumes … timed out waiting for the condition …

参考文献

Enabling kube-proxy IPVS mode prevents access to API server via service IP #1461
Enable IPVS Mode in Kube Proxy on a ready Kubernetes Local Cluster
Kubernetes stuck on ContainerCreating