Calico CNI pods not starting on Kubernetes (CentOS VM) — network unreachable to API server

Summary

The issue at hand is that Calico CNI pods are not starting on a Kubernetes cluster running on a CentOS VM. The root cause of this issue is that the calico-node pod cannot contact the API server, resulting in a network unreachable error.

Root Cause

The root cause of this issue is:

The calico-node pod is unable to connect to the API server due to a network unreachable error.
This error occurs because the calico-node pod is trying to access the API server at https://10.96.0.1:443, but the network is not reachable.

Why This Happens in Real Systems

This issue can occur in real systems due to:

Network configuration issues: The network configuration on the CentOS VM may not be set up correctly, preventing the calico-node pod from accessing the API server.
Firewall rules: Firewall rules may be blocking the traffic from the calico-node pod to the API server.
DNS resolution issues: DNS resolution issues may be preventing the calico-node pod from resolving the API server URL.

Real-World Impact

The real-world impact of this issue is:

Calico CNI pods will not start, preventing the Kubernetes cluster from functioning correctly.
Networking issues: The network unreachable error can cause issues with other pods and services in the Kubernetes cluster.
Cluster instability: The issue can cause instability in the Kubernetes cluster, leading to downtime and lost productivity.

Example or Code (if necessary and relevant)

kubectl -n kube-system logs calico-node-kptdb -c install-cni

This command can be used to view the logs of the calico-node pod and diagnose the issue.

How Senior Engineers Fix It

Senior engineers can fix this issue by:

Verifying network configuration: Verifying that the network configuration on the CentOS VM is set up correctly.
Checking firewall rules: Checking firewall rules to ensure that traffic from the calico-node pod to the API server is not blocked.
Resolving DNS issues: Resolving any DNS resolution issues that may be preventing the calico-node pod from accessing the API server.
Updating Calico configuration: Updating the Calico configuration to use the correct API server URL.

Why Juniors Miss It

Junior engineers may miss this issue because:

Lack of experience: Junior engineers may not have experience with Kubernetes or Calico, making it difficult for them to diagnose the issue.
Limited knowledge of network configuration: Junior engineers may not have a strong understanding of network configuration, making it difficult for them to identify the root cause of the issue.
Overlooking error messages: Junior engineers may overlook the network unreachable error message in the calico-node pod logs, failing to diagnose the issue correctly.