How to store Google Cloud Run config in version control: Terraform or Kubernetes?

Summary

The engineering team encountered a configuration drift and source of truth fragmentation issue. They were using Terraform to provision the “shell” of a Google Cloud Run service but were using gcloud CLI and manual JSON parsing to perform the actual deployments of revisions. This created a split-brain scenario where the infrastructure state (Terraform) and the application state (CLI/JSON) were disconnected, leading to maintenance overhead and potential deployment inconsistencies.

Root Cause

The failure in the architectural pattern stems from two primary flaws:

  • Bypassing the State Machine: By using lifecycle { ignore_changes = all }, the team explicitly instructed Terraform to stop managing the most critical parts of the service. This effectively turned Terraform into a one-time setup tool rather than a continuous state manager.
  • Schema-less Configuration Management: Relying on custom JSON files and manual CLI string interpolation introduces runtime fragility. There is no validation to ensure that the values being passed to gcloud run deploy are actually compatible with the underlying provider’s requirements.

Why This Happens in Real Systems

This pattern is extremely common in growing organizations due to:

  • The “Speed vs. Safety” Trap: Teams often use CLI commands for deployments because they feel “faster” or “easier” to integrate into existing Jenkins/GitHub Actions pipelines than updating a Terraform state file.
  • Separation of Concerns Misunderstanding: Developers often mistakenly believe that Infrastructure (the Cloud Run service) and Deployment (the specific Revision) should be handled by different tools, failing to realize that in a declarative world, they are part of the same logical state.
  • Legacy Pipeline Inertia: Once a CI/CD pipeline is built around a specific CLI command, the friction of migrating to a declarative model (like Terraform or Kustomize) feels too high, even as the technical debt accumulates.

Real-World Impact

  • Configuration Drift: A developer might manually change a memory limit via the Console or CLI, and Terraform will never detect or revert this change, leading to “it works in staging but fails in prod” scenarios.
  • Lack of Auditability: When configuration lives in ephemeral JSON files and CLI arguments, you lose the GitOps advantage. You cannot easily see a single diff that shows how a change in CPU allocation affects the total infrastructure footprint.
  • Increased MTTR (Mean Time To Recovery): During an incident, engineers cannot rely on terraform plan to see what the intended state is, because the actual state is hidden within the history of CLI executions.

Example or Code (if necessary and relevant)

The “Incorrect” pattern used by the team involves a dangerous bypass of the state provider:

resource "google_cloud_run_v2_service" "my_service" {
  name     = "my-service"
  location = "us-central1"

  # DANGER: This makes Terraform blind to all actual deployments
  lifecycle {
    ignore_changes = all
  }

  template {
    containers {
      image = "gcr.io/cloudrun/hello"
    }
  }
}

The “Senior” approach utilizes a Single Source of Truth where the CI/CD pipeline updates the version variable in a controlled manner:

variable "service_image_tag" {
  type        = string
  description = "The specific image tag to deploy"
}

resource "google_cloud_run_v2_service" "my_service" {
  name     = "my-service"
  location = "us-central1"

  template {
    containers {
      image = "gcr.io/my-project/my-app:${var.service_image_tag}"
      resources {
        limits = {
          cpu    = "1"
          memory = "512Mi"
        }
      }
    }
  }
}

How Senior Engineers Fix It

A Senior Engineer would move the organization toward a Declarative Deployment Model. Depending on the team’s maturity, there are two preferred paths:

  1. The Terraform-Centric Path (Recommended for Infra-heavy teams):

    • Remove ignore_changes = all.
    • Treat the Image Tag as a Terraform variable.
    • The CI/CD pipeline does not run gcloud run deploy. Instead, it performs a git commit to a version.tfvars file or uses a tool like Atlantis to run terraform apply.
    • Benefit: Every single change to CPU, Memory, or Image is captured in the Terraform state and Git history.
  2. The GitOps/Knative Path (Recommended for K8s-native teams):

    • If the team wants to treat Cloud Run as “Knative,” they should use a tool like Config Connector or ArgoCD.
    • The configuration is stored as a Kubernetes Service manifest in YAML.
    • Benefit: This provides a unified way to manage both the application and its infrastructure using standard Kubernetes patterns.

Why Juniors Miss It

  • Focus on “Make it Work”: Juniors focus on the immediate goal (deploying the code) rather than the long-term lifecycle of the resource.
  • Tool Siloing: They view gcloud and terraform as two separate, unrelated tools rather than seeing Terraform as the authority and gcloud as a manual override.
  • Ignoring the Lifecycle Block: They see ignore_changes as a “handy way to stop errors” rather than recognizing it as a break in the chain of custody for the infrastructure’s state.

Leave a Comment