Fix Dataflow Historical Data Backfills with Micro-Batch Processing
Summary Backfilling 365 days of compressed historical data triggered critical resource exhaustion, pipeline failures, and unexpected costs. The core issue stemmed from processing too many non-splittable files in a single Dataflow job, causing worker memory overloads and straggler effects. Splitting the workload into daily or monthly micro-batches resolved the failures while optimizing resource utilization. Root … Read more