Why GitLab CI pipelines can’t fetch public LFS objects

Summary

A CI/CD pipeline failed while attempting to fetch Git Large File Storage (LFS) objects from a secondary, public repository during a build process. Despite the target repository being public, the GitLab CI job encountered a 404 Not Found error during the LFS batch API call. The error message explicitly stated that the gitlab-ci-token could not locate the object, leading to a broken build and incomplete workspace.

Root Cause

The issue stems from a fundamental misunderstanding of how Git LFS authentication interacts with CI/CD job tokens.

  • Token Scoping: The gitlab-ci-token is a short-lived, scoped credential. It is strictly limited to the project where the pipeline is running.
  • LFS Protocol Handshake: When Git LFS runs, it first performs a batch API request to the LFS server to determine where to download the actual binary blobs.
  • Authorization Mismatch: Even though the repository is “public” for web browsing, the Git LFS client in the CI runner attempts to use the current job’s identity (gitlab-ci-token) to authenticate the LFS batch request.
  • Strict Access Control: GitLab’s LFS implementation rejects requests where the provided token does not have explicit permission to access the specific LFS object storage endpoint of the target repository, regardless of the repository’s visibility.

Why This Happens in Real Systems

In distributed microservice architectures or complex mono-repo/multi-repo setups, dependencies are often spread across different namespaces.

  • Identity Propagation Failure: Most developers assume that “Public Visibility = No Auth Required.” However, in automated environments, the Git client automatically attaches the available Authorization header to all outgoing requests.
  • API vs. Git Protocol: A user can git clone a public repo via HTTPS without a password, but the LFS Batch API is a distinct RESTful endpoint that enforces stricter validation of the provided bearer token.
  • Security Sandboxing: Modern CI/CD platforms follow the Principle of Least Privilege. A token generated for Project A is intentionally prevented from “reaching out” to Project B to prevent lateral movement in case a pipeline is compromised.

Real-World Impact

  • Pipeline Fragility: Builds fail intermittently or consistently when new dependencies (like ML models or large assets) are added via LFS.
  • Developer Friction: Engineers waste hours debugging “Permission Denied” errors on repositories they know are public.
  • Deployment Blockers: If the LFS object is a critical runtime dependency (e.g., a compiled library or a configuration blob), the entire deployment lifecycle halts.

Example or Code (if necessary and relevant)

# The failing command in the CI log
$ git lfs pull
batch response: Repository or object not found: https://gitlab-ci-token:[MASKED]@gitlab.com/user/public.git/info/lfs/objects/batch

# The incorrect assumption:
# "It's a public repo, so I don't need to provide a token."

# The actual mechanism:
# The Git LFS client sees the 'gitlab-ci-token' in the environment/config 
# and forces it into the LFS API request header.

How Senior Engineers Fix It

Senior engineers solve this by decoupling the authentication of the primary repository from the authentication of the LFS assets.

  • Deploy Tokens: Instead of relying on the ephemeral gitlab-ci-token, create a Project Deploy Token with read_repository scopes for the target repository. Store this as a CI/CD variable.
  • LFS Configuration Overrides: Explicitly configure the Git LFS client to use a different credential provider or URL for the specific LFS endpoint.
  • Pre-fetching Strategy: Use a before_script to clone the required assets into a specific directory using a dedicated, long-lived credential before the main build starts.
  • Explicit URL Mapping: Use .lfsconfig to define how LFS should behave, ensuring it doesn’t attempt to use the CI job token for external domains.

Why Juniors Miss It

  • The “Public” Fallacy: Juniors often assume that if they can see it in a browser, the Git CLI will treat it as an unauthenticated request. They fail to realize the Git client is autofilling credentials.
  • Tooling Abstraction: They treat git lfs as a “black box” that just works, rather than a separate protocol that makes its own independent HTTP API calls.
  • Debugging the Wrong Layer: They spend time checking file permissions or repository visibility instead of investigating the OAuth/Token scope of the CI runner’s identity.

Leave a Comment