Summary
The engineering team encountered a configuration drift and source of truth fragmentation issue. They were using Terraform to provision the “shell” of a Google Cloud Run service but were using gcloud CLI and manual JSON parsing to perform the actual deployments of revisions. This created a split-brain scenario where the infrastructure state (Terraform) and the application state (CLI/JSON) were disconnected, leading to maintenance overhead and potential deployment inconsistencies.
Root Cause
The failure in the architectural pattern stems from two primary flaws:
- Bypassing the State Machine: By using
lifecycle { ignore_changes = all }, the team explicitly instructed Terraform to stop managing the most critical parts of the service. This effectively turned Terraform into a one-time setup tool rather than a continuous state manager. - Schema-less Configuration Management: Relying on custom JSON files and manual CLI string interpolation introduces runtime fragility. There is no validation to ensure that the values being passed to
gcloud run deployare actually compatible with the underlying provider’s requirements.
Why This Happens in Real Systems
This pattern is extremely common in growing organizations due to:
- The “Speed vs. Safety” Trap: Teams often use CLI commands for deployments because they feel “faster” or “easier” to integrate into existing Jenkins/GitHub Actions pipelines than updating a Terraform state file.
- Separation of Concerns Misunderstanding: Developers often mistakenly believe that Infrastructure (the Cloud Run service) and Deployment (the specific Revision) should be handled by different tools, failing to realize that in a declarative world, they are part of the same logical state.
- Legacy Pipeline Inertia: Once a CI/CD pipeline is built around a specific CLI command, the friction of migrating to a declarative model (like Terraform or Kustomize) feels too high, even as the technical debt accumulates.
Real-World Impact
- Configuration Drift: A developer might manually change a memory limit via the Console or CLI, and Terraform will never detect or revert this change, leading to “it works in staging but fails in prod” scenarios.
- Lack of Auditability: When configuration lives in ephemeral JSON files and CLI arguments, you lose the GitOps advantage. You cannot easily see a single diff that shows how a change in CPU allocation affects the total infrastructure footprint.
- Increased MTTR (Mean Time To Recovery): During an incident, engineers cannot rely on
terraform planto see what the intended state is, because the actual state is hidden within the history of CLI executions.
Example or Code (if necessary and relevant)
The “Incorrect” pattern used by the team involves a dangerous bypass of the state provider:
resource "google_cloud_run_v2_service" "my_service" {
name = "my-service"
location = "us-central1"
# DANGER: This makes Terraform blind to all actual deployments
lifecycle {
ignore_changes = all
}
template {
containers {
image = "gcr.io/cloudrun/hello"
}
}
}
The “Senior” approach utilizes a Single Source of Truth where the CI/CD pipeline updates the version variable in a controlled manner:
variable "service_image_tag" {
type = string
description = "The specific image tag to deploy"
}
resource "google_cloud_run_v2_service" "my_service" {
name = "my-service"
location = "us-central1"
template {
containers {
image = "gcr.io/my-project/my-app:${var.service_image_tag}"
resources {
limits = {
cpu = "1"
memory = "512Mi"
}
}
}
}
}
How Senior Engineers Fix It
A Senior Engineer would move the organization toward a Declarative Deployment Model. Depending on the team’s maturity, there are two preferred paths:
-
The Terraform-Centric Path (Recommended for Infra-heavy teams):
- Remove
ignore_changes = all. - Treat the Image Tag as a Terraform variable.
- The CI/CD pipeline does not run
gcloud run deploy. Instead, it performs agit committo aversion.tfvarsfile or uses a tool like Atlantis to runterraform apply. - Benefit: Every single change to CPU, Memory, or Image is captured in the Terraform state and Git history.
- Remove
-
The GitOps/Knative Path (Recommended for K8s-native teams):
- If the team wants to treat Cloud Run as “Knative,” they should use a tool like Config Connector or ArgoCD.
- The configuration is stored as a Kubernetes
Servicemanifest in YAML. - Benefit: This provides a unified way to manage both the application and its infrastructure using standard Kubernetes patterns.
Why Juniors Miss It
- Focus on “Make it Work”: Juniors focus on the immediate goal (deploying the code) rather than the long-term lifecycle of the resource.
- Tool Siloing: They view
gcloudandterraformas two separate, unrelated tools rather than seeing Terraform as the authority andgcloudas a manual override. - Ignoring the Lifecycle Block: They see
ignore_changesas a “handy way to stop errors” rather than recognizing it as a break in the chain of custody for the infrastructure’s state.