Summary
The objective was to implement an Origin Failover pattern at the edge: if a requested resource does not exist on the primary web server (returning a 404), the request should be transparently rerouted to a secondary storage provider like AWS S3. The engineer sought to avoid hitting the primary origin entirely if the content was known to be elsewhere, but found that Cloudflare’s standard rewrite rules are request-based rather than response-based.
Root Cause
The fundamental issue is the decoupling of request transformation from origin response codes.
- Request-side Rewrites: Cloudflare (and most CDNs via Page Rules or Transform Rules) operates on the Request phase. They can change the URL before it reaches the origin based on patterns, headers, or cookies.
- Response-side Logic: The requirement is a Response-side transformation. You cannot trigger a rewrite based on a 404 error because, by the time the 404 is generated, the request has already traveled to the origin, consumed resources, and failed.
- Feature Gap: Cloudflare’s standard Edge Rules do not natively support “Conditional Rewrites based on HTTP Status Codes” in the same way AWS CloudFront’s Origin Groups do.
Why This Happens in Real Systems
In complex architectures, engineers often confuse Routing with Rewriting:
- Routing determines where a request goes based on the incoming metadata.
- Rewriting determines what the URL looks like during the journey.
- Failover is a reactive mechanism that requires the edge to observe the Origin’s health or response state.
- Most edge computing platforms are optimized for high-speed path matching (O(1) complexity), whereas waiting for a response to decide on a new path introduces latency and requires stateful handling of the connection.
Real-World Impact
- Increased Origin Load: If the “failover” is handled via
.htaccessor application code, every “miss” still incurs the full cost of a TCP handshake, TLS negotiation, and application processing on the primary server. - Increased Latency: The user waits for the primary server to fail (404) before the second request is even initiated.
- Cascading Failures: If a primary origin is struggling, high volumes of 404s (looking for missing assets) can saturate the connection pool, preventing legitimate traffic from being processed.
Example or Code
To solve this in Cloudflare, one must move away from static rules and toward Cloudflare Workers, which allow for intercepting the Response object.
addEventListener('fetch', event => {
event.respondWith(handleRequest(event.request))
})
async function handleRequest(request) {
// 1. Attempt to fetch from the primary origin
const response = await fetch(request)
// 2. Check if the response is a 404 Not Found
if (response.status === 404) {
const url = new URL(request.url)
// 3. Construct the new URL for the secondary origin (e.g., S3)
const fallbackUrl = `https://my-storage-bucket.s3.amazonaws.com${url.pathname}${url.search}`
// 4. Fetch from the secondary origin and return that response instead
return fetch(fallbackUrl)
}
// 5. Otherwise, return the original response (200, 301, 500, etc.)
return response
}
How Senior Engineers Fix It
A senior engineer looks beyond the immediate “how do I write this rule” and evaluates the architectural cost:
- Shift Left (Proactive Routing): If the content distribution is predictable, use a manifest-based approach or a Service Worker to know which assets live where before the request is made.
- Edge Logic (Cloudflare Workers): Use Compute-at-Edge to intercept the response. This is the direct solution to the problem of “Response-based Rewriting.”
- Origin Shielding: Use a Tiered Cache or a dedicated Origin Shield to ensure that 404s are cached at the edge, preventing repeated trips to the primary origin for missing files.
- Architecture Alignment: If the requirement is strictly “Origin Grouping” (failover), they might evaluate if the workload is better suited for AWS CloudFront or Google Cloud CDN, which have this specific primitive built into the control plane.
Why Juniors Miss It
- Focus on Configuration, not Lifecycle: Juniors often search for a “setting” or a “checkbox” in the dashboard, assuming all logic can be handled by static configuration.
- Ignoring the Request/Response Cycle: They often forget that a 404 is a Response event, while a Rewrite Rule is a Request event. They attempt to apply request-time logic to response-time problems.
- Overlooking Latency Penalties: They might implement the
.htaccesssolution because it “works,” failing to realize that it doubles the latency for every missing asset and increases the compute cost of the primary origin.