Avoiding $expr Exhaustion: Scaling MongoDB Queries for 200 Million Docs

Summary

During a routine scaling operation of a high-throughput telemetry service, we encountered a critical performance degradation in a MongoDB cluster containing 200M documents. A query pattern utilizing a combination of an $in operator on a high-cardinality field (deviceId) and an $expr containing an $or clause caused CPU exhaustion and high disk I/O. The query was unable to leverage standard B-tree indexes effectively because the $expr operator forces the database to perform collection scans or expensive index intersections that do not scale linearly with data volume.

Root Cause

The failure originated from a fundamental mismatch between the query structure and how MongoDB’s query optimizer processes expressions:

  • The $expr Trap: The use of $expr allows for aggregation expressions within a $match stage. However, standard indexes are designed for direct field comparisons. When $expr is used, MongoDB often cannot use a single index to satisfy the entire predicate.
  • Predicate Complexity: The $or inside the $expr contains logic that requires the engine to evaluate computed values for every document that satisfies the initial deviceId filter.
  • Index Inefficiency: While an index might exist on deviceId, the subsequent $or logic requires a sub-optimal scan of all documents matching those IDs to evaluate the $path, $abc, and $type conditions.
  • Computational Overhead: Evaluating $eq and $in within an expression context for 200M documents leads to high CPU cycles per document, preventing the engine from using the “index-only” scan optimization.

Why This Happens in Real Systems

In distributed production environments, this issue is a byproduct of evolving schema requirements:

  • Feature Creep: As business logic changes, engineers often try to “force” complex conditional logic into existing queries rather than redesigning the data model.
  • Abstraction Leaks: Developers often treat NoSQL databases like relational engines, assuming the optimizer can “figure out” complex logical branches.
  • Scale Blindness: A query that runs in 50ms on a development machine with 10,000 documents becomes a system-killer when applied to a 200M document production collection.

Real-World Impact

  • Increased Latency: P99 latency spiked from 200ms to over 15s, triggering timeouts in upstream microservices.
  • Resource Starvation: High CPU utilization on the primary node prevented other critical operations (like heartbeats), leading to unnecessary election cycles and cluster instability.
  • Increased Costs: Attempting to “throw hardware at the problem” by increasing IOPS or RAM provided diminishing returns because the bottleneck was computational complexity, not throughput.

Example or Code

// The problematic query pattern
db.telemetry.aggregate([
  {
    $match: {
      deviceId: { $in: ["dev_123", "dev_456", "dev_789"] },
      $expr: {
        $or: [
          { $eq: ["$path", "one-particular-path"] },
          { $in: ["$abc", ["true", "false"]] },
          { $eq: ["$type", "fixed-type"] }
        ]
      }
    }
  }
]);

// The optimized approach: Pre-materializing a "search-optimized" field
// or using a flattened document structure.
db.telemetry.aggregate([
  {
    $match: {
      deviceId: { $in: ["dev_123", "dev_456", "dev_789"] },
      search_flag: "target_criteria" // Pre-calculated during ingestion
    }
  }
]);

How Senior Engineers Fix It

Senior engineers solve this by shifting the complexity from read-time to write-time:

  • Data Denormalization: Instead of calculating complex $or logic during a read, we calculate a computed boolean or enum field at the moment of ingestion.
  • Materialized Views: We implement a pattern where a secondary collection (or an on-demand materialized view) stores only the documents that meet the complex criteria.
  • Compound Indexing: If the logic can be simplified, we build a compound index that covers the equality predicates first, then the range/in predicates.
  • Schema Redesign: We evaluate if the $or logic actually indicates that the data belongs in different collections or if the schema should be flattened to avoid $expr.

Why Juniors Miss It

  • Focus on Syntax over Semantics: Juniors focus on “making the query work” (syntax) rather than “making the query efficient” (execution plan).
  • Lack of Execution Plan Analysis: They often fail to run .explain("executionStats") to see that a query is performing a COLLSCAN or an expensive FETCH stage.
  • Small Dataset Bias: They test against local datasets where the cost of a collection scan is negligible, failing to account for the logarithmic growth of cost in production-scale datasets.
  • Over-reliance on Expressiveness: They view the ability to use $expr as a feature rather than a performance penalty.

Leave a Comment