Fixing Orphaned KamalContainers After Config Split

Summary

During a configuration refactor, a deployment engineer split a monolithic deploy.yml into environment-specific files (deploy.production.yml and deploy.staging.yml). While this is a best practice for managing multiple environments, the act of renaming or restructuring the configuration caused a loss of stateful association between the Kamal CLI and the existing running containers on the production server.

The production environment is currently running a “ghost” application: a container managed by the original default configuration that no longer matches the metadata/destination definitions in the new production config. Consequently, the CLI cannot “see” the existing containers, making remote execution impossible and risking a port conflict and downtime during the next deployment attempt.

Root Cause

The issue stems from how Kamal tracks application identity and deployment targets.

  • Configuration-to-Destination Mapping: Kamal uses the configuration file to derive the app name and destination identity. When the configuration was split, the “default” application context was lost.
  • Metadata Mismatch: The existing containers on the server were tagged and managed under the context of the old deploy.yml. The new deploy.production.yml defines a new deployment context.
  • Orphaned Resources: The running containers are technically “orphaned.” They are still serving traffic via kamal-proxy, but because the new configuration’s destination name or app name doesn’t align with the existing resource tags, the Kamal CLI treats the target as a blank slate.

Why This Happens in Real Systems

In distributed systems and deployment tooling, identity is everything.

  • Implicit vs. Explicit Configuration: Many tools rely on an implicit “default” state. When you move from an implicit state (one file) to an explicit state (multiple files), you must explicitly re-map the identity of the existing infrastructure.
  • Stateful Deployment: Tools like Kamal, Capistrano, or even Kubernetes rely on labels and tags to determine which resources belong to which deployment unit. If the label changes, the tool assumes the old resources are irrelevant and attempts to create new ones.
  • The “Clean Slate” Fallacy: Engineers often assume that as long as the code is the same, the deployment will be seamless. However, the deployment orchestrator requires a continuous chain of identity to perform rolling updates.

Real-World Impact

  • Service Downtime: A standard kamal deploy will attempt to spin up new containers on the same ports/hostnames used by the “orphaned” containers, leading to port binding errors.
  • Loss of Observability: Commands like kamal details or kamal app exec will return empty results or errors, leaving engineers “blind” to the actual state of production.
  • Traffic Disruption: If the deployment fails halfway through due to resource conflicts, the kamal-proxy may be left in an inconsistent state, potentially dropping traffic to both the old and new containers.

Example or Code

To fix this, you must ensure the new production configuration matches the original application name and use the -d flag to point to the new destination file. If the app name changed during the split, you must align them.

# Ensure your deploy.production.yml has the exact same 'app' name 
# as the original deploy.yml had.

# To safely re-associate and deploy without downtime, 
# you must force Kamal to recognize the existing setup.
# Usually, this involves ensuring the destination and app name match exactly.

kamal deploy -d production

How Senior Engineers Fix It

A senior engineer does not simply “try again” and hope for the best. They follow a reconciliation strategy:

  1. Audit Identity: Inspect the existing containers on the server (via docker ps) to identify the exact app name and labels used by the original deployment.
  2. Synchronize Config: Update deploy.production.yml so that the app: property matches the label of the running containers exactly.
  3. Validate Destination: Ensure the destination block in the new file points to the same host(s) as the original.
  4. Zero-Downtime Transition: Perform a dry-run or a manual check of the configuration. If the identity is perfectly matched, Kamal will recognize the existing containers as part of the same “app” and perform a standard rolling update (start new -> health check -> stop old).
  5. Manual Cleanup (If Necessary): If the identity is irreconcilable, a senior engineer would schedule a brief maintenance window to manually stop the orphaned containers before running the new deployment to prevent port conflicts.

Why Juniors Miss It

  • Focusing on Code, Not Orchestration: Juniors often focus on whether the Dockerfile and the application logic are correct, neglecting the metadata and orchestration layer that connects the code to the hardware.
  • Assuming Continuity: There is a tendency to assume that changing a file structure is a “cosmetic” change, not realizing that for deployment tools, configuration structure defines operational identity.
  • Fear of the CLI: When kamal details returns nothing, a junior might assume the server is down or the connection is broken, rather than realizing they have effectively “lost the keys” to the existing containers due to a configuration mismatch.

Leave a Comment