Docker Healthcheck Failure After Migrating to Chainguard Node Images
Summary
Migrating from Node Alpine to Chainguard Node images caused the Docker healthcheck to fail silently. The container remained in an unhealthy state because the healthcheck command relied on wget, which is not available in Chainguard’s minimal image footprint. The solution involves switching to built-in Node.js mechanisms for health verification rather than depending on external utilities.
Root Cause
- Chainguard images follow a “distroless” philosophy and ship with only the absolute minimum packages required to run the application
- The original healthcheck used
wget --spider http://localhost:3000/which expected the wget binary to be present - Chainguard Node images do not include wget by default—they use Wolfi as the base instead of Alpine
- The healthcheck command exits with a non-zero status when the binary is missing, causing Docker to mark the container as unhealthy
- No error message is surfaced to the logs because Docker treats the missing command as a healthcheck failure, not a configuration error
Why This Happens in Real Systems
- Teams copy healthcheck configurations from one image to another without verifying tool availability in the new base image
- Alpine Linux includes many common utilities (wget, curl, bash) that other minimal images omit
- Chainguard images prioritize security and minimal attack surface over convenience, intentionally excluding tools that could be security risks
- Documentation often fails to highlight these breaking changes when switching base images
- The migration from Alpine to Chainguard is becoming common due to Chainguard’s security-first approach, making this a widespread issue
Real-World Impact
- Containers fail to start properly in production environments that require health checks for load balancer registration
- Deployment pipelines may hang or timeout waiting for healthy containers
- Service discovery systems like Kubernetes or Docker Swarm will not route traffic to unhealthy containers
- Monitoring alerts may fire incorrectly, creating noise and masking real issues
- Teams waste debugging time assuming the application itself is broken rather than the healthcheck configuration
Example or Code
The original failing configuration:
healthcheck:
test: ["CMD", "wget", "--spider", "http://localhost:3000/"]
interval: 5s
timeout: 3s
retries: 30
A working solution using Node.js built-in HTTP module:
healthcheck:
test: ["CMD", "node", "-e", "require('http').get('http://localhost:3000/', (r) => process.exit(r.statusCode === 200 ? 0 : 1)).on('error', () => process.exit(1))"]
interval: 5s
timeout: 3s
retries: 30
Alternative using a dedicated health endpoint in the application:
healthcheck:
test: ["CMD", "node", "-e", "fetch('http://localhost:3000/health').then(r => { if (!r.ok) throw new Error(); process.exit(0); }).catch(() => process.exit(1))"]
interval: 5s
timeout: 3s
retries: 30
How Senior Engineers Fix It
-
Audit all external dependencies in healthcheck commands when changing base images
-
Prefer native language solutions (Node.js HTTP module) over system utilities for portability
-
Create a dedicated
/healthor/healthzendpoint in the application for explicit health verification -
Add a small healthcheck script to the project that can be reused across environments:
// healthcheck.js const http = require('http'); const port = process.argv[2] || 3000; const req = http.get(`http://localhost:${port}/`, (res) => { process.exit(res.statusCode === 200 ? 0 : 1); }); req.on('error', () => process.exit(1)); -
Document base image requirements in the project’s README
-
Use multi-stage builds to include debugging tools only in development images while keeping production minimal
Why Juniors Miss It
- Assume all Linux distributions include standard utilities like wget, curl, and bash
- Focus only on application code functionality and overlook infrastructure configuration
- Lack awareness of the differences between Alpine, Debian, and distroless-based images
- Do not test healthchecks in non-production environments before deploying to production
- Trust that migration guides cover all edge cases (they often do not)
- May not understand that healthcheck failures prevent container orchestration systems from working correctly