How to resolve memory imbalance in PolarDB PG after enabling IMCI?

Summary

This incident analyzes memory imbalance in PolarDB PostgreSQL with IMCI enabled, where the RW node experiences significantly higher memory pressure than RO nodes due to row‑to‑columnar conversion overhead. The imbalance leads to RW OOM events during peak load, impacting service stability and query latency.

Root Cause

The core issue arises from IMCI’s row‑to‑columnar conversion, which occurs only on the RW node:

  • RW must convert row‑format data into columnar format before IMCI can process it.
  • RO nodes typically read already‑converted columnar data, avoiding this overhead.
  • Conversion requires:
    • Additional CPU cycles
    • Temporary memory buffers
    • Larger executor memory footprints
  • Under high concurrency, these buffers accumulate, causing memory spikes and eventual OOM on RW.

Why This Happens in Real Systems

Memory imbalance is common in distributed PostgreSQL‑derived systems because:

  • RW nodes perform more write‑path work (WAL, visibility checks, MVCC cleanup).
  • Columnar engines add extra transformation steps that are not symmetric across nodes.
  • Query planners may choose IMCI paths aggressively, even when memory is tight.
  • Memory parameters are often tuned uniformly, ignoring RW/RO asymmetry.

Real-World Impact

Teams typically observe:

  • RW node OOM kills, causing failovers or degraded cluster performance.
  • Increased latency for IMCI‑enabled analytical queries.
  • Unpredictable memory spikes during peak business hours.
  • RO nodes underutilized, while RW becomes the bottleneck.

Example or Code (if necessary and relevant)

Below is an example of adjusting executor memory parameters to reduce RW pressure:

ALTER SYSTEM SET work_mem = '64MB';
ALTER SYSTEM SET hash_mem_multiplier = 1.0;
ALTER SYSTEM SET enable_imci = on;
SELECT pg_reload_conf();

How Senior Engineers Fix It

Experienced engineers approach this with multi‑layered mitigation, not a single parameter tweak.

1. Memory Configuration Adjustments

  • Lower work_mem on RW to reduce per‑query memory footprint.
  • Tune IMCI‑related buffers (if exposed by PolarDB PG version).
  • Reduce parallel workers on RW to limit concurrent conversions.

2. Query Execution Strategy

  • Encourage IMCI usage on RO nodes by:
    • Routing analytical queries to RO
    • Using query hints or proxy rules
  • Disable IMCI for certain OLTP‑heavy workloads on RW.

3. Proxy Routing Best Practices

  • Route read‑heavy IMCI queries to RO whenever possible.
  • Use latency‑aware routing to avoid overloading RW.
  • Apply query‑pattern‑based routing (e.g., analytical vs transactional).

4. Architectural Improvements

  • Introduce dedicated RO nodes for IMCI workloads.
  • Use connection pooling to cap RW concurrency.
  • Monitor:
    • Memory fragmentation
    • Executor memory usage
    • IMCI conversion metrics (if available)

5. Leverage PolarDB Features

Senior engineers track and adopt:

  • Automatic memory balancing improvements in newer PolarDB PG releases.
  • IMCI execution optimizations that reduce conversion overhead.
  • Adaptive query routing features (if provided by the proxy layer).

Why Juniors Miss It

Less experienced engineers often overlook this issue because:

  • They assume RW and RO nodes behave symmetrically, which is false with IMCI.
  • They tune memory uniformly across nodes, ignoring RW’s extra workload.
  • They focus on CPU or I/O metrics, missing executor memory spikes.
  • They rely on default proxy routing, unaware that IMCI queries must be steered deliberately.
  • They expect the database to automatically optimize IMCI memory usage, not realizing that backend improvements may lag behind real‑world workloads.

Key takeaway: IMCI introduces asymmetric memory behavior, and RW nodes require specialized tuning and routing strategies to remain stable under load.

Leave a Comment