[Events] x/x k8s nodes are available: x Insufficient pods
Configure maximum Pods per node | Google Kubernetes Engine (GKE)
问题描述
Pod 出于 Pending 状态,kubectl describe 提示:
... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 Insufficient pods.
原因分析
单节点容纳 Pod 数量有限,默认最多 110 个 Pod 实例:
# kubectl describe nodes k8s120-wn100 ... Capacity: cpu: 8 ephemeral-storage: 9974088Ki hugepages-2Mi: 0 memory: 16392456Ki pods: 110 Allocatable: cpu: 8 ephemeral-storage: 9192119486 hugepages-2Mi: 0 memory: 16290056Ki pods: 110 ...
解决方案
通过为 kubelet 指定 –max-pods <num> 来控制单节点 Pod 最大数量;
ContainerCreating
创建 POD 实例一直处于 ContainerCreating 状态;
然后我们搜索到「Pod 异常排错」一文。关键内容如下:
可以发现,该 Pod 的 Sandbox 容器无法正常启动,具体原因需要查看 Kubelet 日志: 发现是 cni0 网桥配置了一个不同网段的 IP 地址导致,删除该网桥(网络插件会自动重新创建)即可修复 除了以上错误,其他可能的原因还有 镜像拉取失败,比如 配置了错误的镜像 Kubelet 无法访问镜像(国内环境访问 gcr.io 需要特殊处理) 私有镜像的密钥配置错误 镜像太大,拉取超时(可以适当调整 kubelet 的 --image-pull-progress-deadline 和 --runtime-request-timeout 选项) CNI 网络错误,一般需要检查 CNI 网络插件的配置,比如 无法配置 Pod 网络 无法分配 IP 地址 容器无法启动,需要检查是否打包了正确的镜像或者是否配置了正确的容器参数
然后查看 kubelet 日志:
# journalctl -f -u kubelet.service | grep -i error -C 500 # 为了用红色标记 Error 字体,易于识别 -- Logs begin at Wed 2019-12-04 01:04:12 CST. -- Dec 04 12:05:41 k8s-master2 kubelet[27615]: E1204 12:05:41.726630 27615 kuberuntime_manager.go:605] killPodWithSyncResult failed: failed to "KillPodSandbox" for "c123d775-1646-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout" Dec 04 12:05:41 k8s-master2 kubelet[27615]: E1204 12:05:41.726884 27615 pod_workers.go:186] Error syncing pod c123d775-1646-11ea-b2b2-005056814b85 ("kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system(c123d775-1646-11ea-b2b2-005056814b85)"), skipping: failed to "KillPodSandbox" for "c123d775-1646-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dialtcp 10.96.0.1:443: i/o timeout" Dec 04 12:05:42 k8s-master2 kubelet[27615]: W1204 12:05:42.544445 27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9" Dec 04 12:05:42 k8s-master2 kubelet[27615]: 2019-12-04 12:05:42.664 [INFO][15340] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kubernetes-dashboard-7cbc7c7975-b2d4r;K8S_POD_INFRA_CONTAINER_ID=a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig] Dec 04 12:05:44 k8s-master2 kubelet[27615]: 2019-12-04 12:05:44.808 [INFO][15145] customresource.go 217: Error getting resource Key=ClusterInformation(default) Name="default" Resource="ClusterInformations" Revision="" error=Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:05:44 k8s-master2 kubelet[27615]: E1204 12:05:44.810985 27615 cni.go:330] Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:05:44 k8s-master2 kubelet[27615]: E1204 12:05:44.813039 27615 remote_runtime.go:119] StopPodSandbox "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-567578c766-hk88x_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:05:44 k8s-master2 kubelet[27615]: E1204 12:05:44.813149 27615 kuberuntime_manager.go:810] Failed to stop sandbox {"docker" "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3"} Dec 04 12:05:44 k8s-master2 kubelet[27615]: E1204 12:05:44.813248 27615 kuberuntime_manager.go:605] killPodWithSyncResult failed: failed to "KillPodSandbox" for "40da0456-15f0-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"coredns-567578c766-hk88x_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout" Dec 04 12:05:44 k8s-master2 kubelet[27615]: E1204 12:05:44.813311 27615 pod_workers.go:186] Error syncing pod 40da0456-15f0-11ea-b2b2-005056814b85 ("coredns-567578c766-hk88x_kube-system(40da0456-15f0-11ea-b2b2-005056814b85)"), skipping: failed to "KillPodSandbox" for "40da0456-15f0-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"coredns-567578c766-hk88x_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout" Dec 04 12:05:45 k8s-master2 kubelet[27615]: W1204 12:05:45.641059 27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3" Dec 04 12:05:45 k8s-master2 kubelet[27615]: 2019-12-04 12:05:45.722 [INFO][15369] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=coredns-567578c766-hk88x;K8S_POD_INFRA_CONTAINER_ID=29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig] Dec 04 12:05:54 k8s-master2 kubelet[27615]: E1204 12:05:54.017860 27615 pod_workers.go:186] Error syncing pod f9cae75f-1648-11ea-b2b2-005056814b85 ("calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)"), skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)" Dec 04 12:05:57 k8s-master2 kubelet[27615]: 2019-12-04 12:05:57.470 [INFO][15222] customresource.go 217: Error getting resource Key=ClusterInformation(default) Name="default" Resource="ClusterInformations" Revision="" error=Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:05:57 k8s-master2 kubelet[27615]: E1204 12:05:57.473652 27615 cni.go:330] Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:05:57 k8s-master2 kubelet[27615]: E1204 12:05:57.474915 27615 remote_runtime.go:119] StopPodSandbox "02709b1f4b280bc4eb167115b84d0eb320746cc25a40c8cbb2ff40149d99346d" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-686495bd6c-qnwvk_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:05:57 k8s-master2 kubelet[27615]: E1204 12:05:57.474969 27615 kuberuntime_gc.go:153] Failed to stop sandbox "02709b1f4b280bc4eb167115b84d0eb320746cc25a40c8cbb2ff40149d99346d" before removing:rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-686495bd6c-qnwvk_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:05:57 k8s-master2 kubelet[27615]: W1204 12:05:57.479126 27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "830d31b4cf0b44ae761fadaff787a8ccbf0de22784caa48ab9cf08095a84e1a6" Dec 04 12:05:57 k8s-master2 kubelet[27615]: 2019-12-04 12:05:57.556 [INFO][15494] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=830d31b4cf0b44ae761fadaff787a8ccbf0de22784caa48ab9cf08095a84e1a6 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=coredns-686495bd6c-zzg9p;K8S_POD_INFRA_CONTAINER_ID=830d31b4cf0b44ae761fadaff787a8ccbf0de22784caa48ab9cf08095a84e1a6 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig] Dec 04 12:06:08 k8s-master2 kubelet[27615]: E1204 12:06:08.017672 27615 pod_workers.go:186] Error syncing pod f9cae75f-1648-11ea-b2b2-005056814b85 ("calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)"), skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)" Dec 04 12:06:12 k8s-master2 kubelet[27615]: 2019-12-04 12:06:12.673 [INFO][15340] customresource.go 217: Error getting resource Key=ClusterInformation(default) Name="default" Resource="ClusterInformations" Revision="" error=Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:06:12 k8s-master2 kubelet[27615]: E1204 12:06:12.676159 27615 cni.go:330] Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:06:12 k8s-master2 kubelet[27615]: E1204 12:06:12.677426 27615 remote_runtime.go:119] StopPodSandbox "a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:06:12 k8s-master2 kubelet[27615]: E1204 12:06:12.677515 27615 kuberuntime_manager.go:810] Failed to stop sandbox {"docker" "a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9"} Dec 04 12:06:12 k8s-master2 kubelet[27615]: E1204 12:06:12.677614 27615 kuberuntime_manager.go:605] killPodWithSyncResult failed: failed to "KillPodSandbox" for "c123d775-1646-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout" Dec 04 12:06:12 k8s-master2 kubelet[27615]: E1204 12:06:12.677668 27615 pod_workers.go:186] Error syncing pod c123d775-1646-11ea-b2b2-005056814b85 ("kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system(c123d775-1646-11ea-b2b2-005056814b85)"), skipping: failed to "KillPodSandbox" for "c123d775-1646-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"kubernetes-dashboard-7cbc7c7975-b2d4r_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dialtcp 10.96.0.1:443: i/o timeout" Dec 04 12:06:13 k8s-master2 kubelet[27615]: W1204 12:06:13.497768 27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9" Dec 04 12:06:13 k8s-master2 kubelet[27615]: 2019-12-04 12:06:13.583 [INFO][15614] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kubernetes-dashboard-7cbc7c7975-b2d4r;K8S_POD_INFRA_CONTAINER_ID=a8d6deb3a425f320ee85d4e562bb7a0164dce98505a03fe8e2e89a48bbf0a5f9 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig] Dec 04 12:06:15 k8s-master2 kubelet[27615]: 2019-12-04 12:06:15.728 [INFO][15369] customresource.go 217: Error getting resource Key=ClusterInformation(default) Name="default" Resource="ClusterInformations" Revision="" error=Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:06:15 k8s-master2 kubelet[27615]: E1204 12:06:15.731558 27615 cni.go:330] Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:06:15 k8s-master2 kubelet[27615]: E1204 12:06:15.733986 27615 remote_runtime.go:119] StopPodSandbox "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-567578c766-hk88x_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:06:15 k8s-master2 kubelet[27615]: E1204 12:06:15.734094 27615 kuberuntime_manager.go:810] Failed to stop sandbox {"docker" "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3"} Dec 04 12:06:15 k8s-master2 kubelet[27615]: E1204 12:06:15.734190 27615 kuberuntime_manager.go:605] killPodWithSyncResult failed: failed to "KillPodSandbox" for "40da0456-15f0-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"coredns-567578c766-hk88x_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout" Dec 04 12:06:15 k8s-master2 kubelet[27615]: E1204 12:06:15.734244 27615 pod_workers.go:186] Error syncing pod 40da0456-15f0-11ea-b2b2-005056814b85 ("coredns-567578c766-hk88x_kube-system(40da0456-15f0-11ea-b2b2-005056814b85)"), skipping: failed to "KillPodSandbox" for "40da0456-15f0-11ea-b2b2-005056814b85" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"coredns-567578c766-hk88x_kube-system\" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout" Dec 04 12:06:16 k8s-master2 kubelet[27615]: W1204 12:06:16.586533 27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3" Dec 04 12:06:16 k8s-master2 kubelet[27615]: 2019-12-04 12:06:16.692 [INFO][15641] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=coredns-567578c766-hk88x;K8S_POD_INFRA_CONTAINER_ID=29c4b7a999045677e52cf86f7eac21af3dc8888fc0c51583529c349aad0600a3 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig] Dec 04 12:06:23 k8s-master2 kubelet[27615]: E1204 12:06:23.017810 27615 pod_workers.go:186] Error syncing pod f9cae75f-1648-11ea-b2b2-005056814b85 ("calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)"), skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-2d8wg_kube-system(f9cae75f-1648-11ea-b2b2-005056814b85)" Dec 04 12:06:27 k8s-master2 kubelet[27615]: 2019-12-04 12:06:27.562 [INFO][15494] customresource.go 217: Error getting resource Key=ClusterInformation(default) Name="default" Resource="ClusterInformations" Revision="" error=Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:06:27 k8s-master2 kubelet[27615]: E1204 12:06:27.565109 27615 cni.go:330] Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:06:27 k8s-master2 kubelet[27615]: E1204 12:06:27.566898 27615 remote_runtime.go:119] StopPodSandbox "830d31b4cf0b44ae761fadaff787a8ccbf0de22784caa48ab9cf08095a84e1a6" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-686495bd6c-zzg9p_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:06:27 k8s-master2 kubelet[27615]: E1204 12:06:27.566959 27615 kuberuntime_gc.go:153] Failed to stop sandbox "830d31b4cf0b44ae761fadaff787a8ccbf0de22784caa48ab9cf08095a84e1a6" before removing:rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "coredns-686495bd6c-zzg9p_kube-system" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout Dec 04 12:06:27 k8s-master2 kubelet[27615]: W1204 12:06:27.572646 27615 cni.go:293] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "4163c0e3d95328d6a9020ebec435942e6891be8b4156967cab119b887bf55d19" Dec 04 12:06:27 k8s-master2 kubelet[27615]: 2019-12-04 12:06:27.662 [INFO][15714] utils.go 479: Configured environment: [CNI_COMMAND=DEL CNI_CONTAINERID=4163c0e3d95328d6a9020ebec435942e6891be8b4156967cab119b887bf55d19 CNI_NETNS= CNI_ARGS=IgnoreUnknown=1;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=coredns-686495bd6c-c7f8d;K8S_POD_INFRA_CONTAINER_ID=4163c0e3d95328d6a9020ebec435942e6891be8b4156967cab119b887bf55d19 CNI_IFNAME=eth0 CNI_PATH=/opt/cni/bin LANG=en_US.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml KUBELET_KUBEADM_ARGS=--cgroup-driver=cgroupfs --network-plugin=cni KUBELET_EXTRA_ARGS=--feature-gates=AttachVolumeLimit=false DATASTORE_TYPE=kubernetes KUBECONFIG=/etc/cni/net.d/calico-kubeconfig]
从错误里看多半是因为:
Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout
推测原因是无法访问接口,导致 Calico 无法正常启动;
然后我们就 Google 搜索Error deleting network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default信息;
然后我们就得到这篇文档「Enabling kube-proxy IPVS mode prevents access to API server via service IP #1461」,然后我们查看发现 kube-proxy 的 ConfigMap.mode 字段是空的(mode: “”)。然后我们查看 kube-proxy 日志,发现他在访问Get https://1.2.3.4:6443地址;
原因分析
是 kube-proxy 配置存在问题:
E1204 05:46:21.533391 1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Endpoints: Get https://1.2.3.4:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 1.2.3.4:6443: i/o timeout E1204 05:46:23.780279 1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Service: Get https://1.2.3.4:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 1.2.3.4:6443: i/o timeout E1204 05:46:52.536354 1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Endpoints: Get https://1.2.3.4:6443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 1.2.3.4:6443: i/o timeout
我们也不知道 ConfigMap 为什么会请求 1.2.3.4 地址,估计是之前升级失败没有完全回滚;
解决方案
修改 kube-proxy 的 ConfigMap 配置(kubectl edit -n kube-system configmaps kube-proxy),将kubeconfig.conf键的clusters.cluster.server修改为 API Server 的地址;
然后,重启 POD 实例:
kubectl delete pod -n kube-system --force --grace-period=0 $( kubectl get pod -n kube-system | grep kube-proxy | awk '{printf "%s ", $1}' ) kubectl delete pod -n kube-system --force --grace-period=0 $( kubectl get pod -n kube-system | grep calico-node | awk '{printf "%s ", $1}' ) kubectl delete pod -n kube-system --force --grace-period=0 $( kubectl get pod -n kube-system | grep coredns | awk '{printf "%s ", $1}' ) kubectl delete pod -n kube-system --force --grace-period=0 $( kubectl get pod -n kube-system | grep kubernetes-dashboard | awk '{printf "%s ", $1}' )
… orphaned pod found …
kubelet 提示 orphaned pod found 错误
解决方案:删除 /var/lib/kubelet/pods/<uuid> 目录
… Unable to attach or mount volumes … timed out waiting for the condition …
参考文献
Enabling kube-proxy IPVS mode prevents access to API server via service IP #1461
Enable IPVS Mode in Kube Proxy on a ready Kubernetes Local Cluster
Kubernetes stuck on ContainerCreating