Summary
A performance degradation was observed in a time-series data retrieval pattern where queries filtered by both a timestamp range and a discrete sensor ID. The initial implementation relied on single-column indexes, which led to high CPU usage and increased query latency. The investigation revealed that the database engine was performing an index intersection or a full scan of one index followed by a filter on the other, rather than utilizing a single, optimized data structure to satisfy the composite predicate.
Root Cause
The root cause is the misunderstanding of index selectivity and the mechanics of composite indexes in distributed time-series databases.
- Individual Index Limitation: When
timestamp_idxandsensor_id_idxare created separately, the engine must pick one index to narrow down the result set. - The “Intersection” Penalty: If the engine picks the
timestampindex, it must still scan every record within that time range to check thesensor_id. If the time range is large, this results in massive I/O overhead. - API Misuse: The developer attempted to pass single column names to
createIndex, failing to realize that a Composite Index requires a specific multi-column definition to allow the engine to traverse a single B-Tree (or equivalent) for both criteria simultaneously.
Why This Happens in Real Systems
In distributed time-series environments like GridDB, data is often partitioned by time.
- High Cardinality vs. Low Cardinality:
sensor_idmight have low cardinality (few sensors), whiletimestamphas extremely high cardinality. - Predicate Selectivity: A query that asks for “all data in the last 24 hours” returns millions of rows. If you only index
timestamp, the database performs a massive scan. A composite index allows the engine to jump directly to the specific intersection of the sensor and the time range. - Optimizer Heuristics: Database optimizers are not magic. If they see two separate indexes, they may choose a “suboptimal” one because the cost estimation of an index intersection is often more expensive than a single-index scan.
Real-World Impact
- Increased Query Latency: Queries that should take milliseconds take seconds, breaking real-time dashboards.
- Resource Exhaustion: High CPU and Disk I/O usage on database nodes, potentially causing cascading failures in a distributed cluster.
- Scaling Bottlenecks: As the dataset grows from gigabytes to terabytes, the lack of composite indexing causes an exponential degradation in performance.
Example or Code
To fix this in GridDB, you must pass an array of column names to the index creation method to define the composite structure.
// Correct way to create a composite index on (sensor_id, timestamp)
String[] compositeColumns = {"sensor_id", "timestamp"};
container.createIndex("composite_sensor_time_idx", IndexType.TREE, compositeColumns);
To verify usage, use the EXPLAIN keyword in your TQL query via the GridDB shell or management console:
EXPLAIN SELECT * WHERE sensor_id = 'SN-1002' AND timestamp > '2023-01-01T00:00:00Z'
How Senior Engineers Fix It
- Column Ordering Strategy: Senior engineers know that order matters in composite indexes. We place the equality predicate (
sensor_id = ?) before the range predicate (timestamp > ?) in the index definition. This allows the engine to seek the exact sensor and then scan the contiguous time range. - Execution Plan Analysis: We never assume an index is working. We always use
EXPLAINto ensure the query planner is performing anINDEX SCANon the composite index rather than anINDEX INTERSECTIONorTABLE SCAN. - Workload Profiling: We analyze the cardinality of columns before designing the index to ensure we aren’t creating bloated indexes that consume excessive memory.
Why Juniors Miss It
- The “More is Better” Fallacy: Juniors often believe that creating more individual indexes is better than creating one composite index.
- Ignoring Column Order: They often treat composite indexes as unordered sets, failing to realize that an index on
(A, B)is useless for a query filtering only onB. - Lack of Observability: They rely on “it feels fast” during small-scale testing instead of verifying the database execution plan to confirm how the engine is actually traversing the data.