Summary
The Kubernetes TCP readiness probe initially returns a “connection refused” error for a Kafka broker due to the way TCP sockets and readiness probes work in Kubernetes. This issue is not specific to Kafka, but rather a general behavior of TCP readiness probes in Kubernetes.
Root Cause
The root cause of this issue is:
- The Kafka broker takes some time to fully start and listen on the specified port (9092 in this case)
- The TCP readiness probe checks if the port is open, but it does not guarantee that there is a process listening on that port
- If the probe checks the port before the Kafka broker is fully started, it will return a “connection refused” error
Why This Happens in Real Systems
This happens in real systems because:
- Container startup times can vary depending on the system resources and the complexity of the container startup process
- Readiness probes are designed to check if a container is ready to receive traffic, but they do not account for the time it takes for the container to fully start
- TCP sockets require a process to be listening on the port in order to establish a connection
Real-World Impact
The real-world impact of this issue is:
- Delayed container startup: The container may take longer to start due to the repeated “connection refused” errors
- Increased latency: The repeated probes can increase the latency of the system as a whole
- Potential errors: If the probe fails repeatedly, it can lead to errors in the system, such as pod restarts or deployment failures
Example or Code (if necessary and relevant)
apiVersion: v1
kind: Pod
metadata:
name: kafka-broker
spec:
containers:
- name: kafka-broker
image: confluentinc/cp-kafka:5.4.3
ports:
- containerPort: 9092
readinessProbe:
tcpSocket:
port: 9092
initialDelaySeconds: 15
periodSeconds: 5
How Senior Engineers Fix It
Senior engineers fix this issue by:
- Increasing the initial delay of the readiness probe to give the container enough time to start
- Adjusting the period of the probe to reduce the number of repeated probes
- Using a more advanced readiness probe, such as an exec probe or an http probe, that can check the actual status of the container
Why Juniors Miss It
Juniors may miss this issue because:
- Lack of understanding of how TCP sockets and readiness probes work in Kubernetes
- Insufficient experience with container startup times and probe configurations
- Overreliance on default configurations, which may not be suitable for all use cases