Stale Oracle 11g stats causing empty results on first query run

Summary

Oracle 11g returned an empty result set on the first execution of a seemingly correct query, and only produced rows on subsequent runs.
The issue involved an inner join against table R000 and a conditional WHERE clause that later caused the optimizer to choose a different plan. Changing the join to a left outer join resolved the problem, but adding WHERE R000.FKLZ IS NOT NULL re‑introduced the failure.

The root cause was a stale optimizer statistic on R000 that led the planner to pick a plan that discarded the rows, followed by a “warm‑up” scenario where the same query generated a different plan after the first run.


Root Cause

  • Outdated or missing statistics on R000 caused the optimizer to misestimate cardinality.
  • The optimizer cache reused a plan that assumed a very low selectivity for the join condition.
  • After the first execution, the runtime statistics were refreshed implicitly, prompting a new plan that actually returned rows.
  • Adding a WHERE R000.FKLZ IS NOT NULL filtered out the hidden rows because the earlier plan had already eliminated them.

Why This Happens in Real Systems

  • Statistical drift: Production workloads change, but tables don’t get re‑stats frequently.
  • Cache behavior: Oracle may keep a plan for a query in the shared pool; subsequent runs use the cached plan until the cache is purged.
  • First‑run cold cache: The initial execution might not hit the costs expected because of uninitialized statistics.
  • Implicit plan changes: A small modification (e.g., changing an INNER JOIN to LEFT OUTER JOIN) can trigger a totally different plan.

Real-World Impact

  • Zero availability of critical data for the first run of a batch job.
  • Data inconsistency in downstream processes that depend on the query.
  • Unexpected downtime for dashboards or analytics that rely on the query.
  • Increased support load as operators investigate intermittent “no rows” errors.

Example or Code (if necessary and relevant)

SELECT RC201.FKLZ RC201_FKLZ,
       RC201.NR RC201_NR,
       -- ... other columns omitted for brevity ...
       R000.FKFS
FROM   RC201
JOIN   RC220 ON RC201.CUST_FPLZ = RC220.CUST_FPLZ
JOIN   R000 ON RC201.FKLZ = R000.FKLZ
WHERE  ( R000.FKFS < 7 AND
         (rc201.cust_fpfs IS NULL OR (rc201.cust_fpfs  0)) AND
         (rc220.cust_fpfs IS NULL OR (rc220.cust_fpfs  0))
       )
  OR ( RC201.cust_fpfs >= 7 AND active = 1 )
  OR ( rc220.cust_fpfs >= 7 AND active = 1 )
  OR ( R000.FKFS >= 7 AND active = 1 )
  AND RC201.CUST_FPFS != 2
  AND (rc201.fklz != 'UNPROD' or rc201.active = 1)
  -- problematic condition when uncommented
  -- AND R000.FKLZ IS NOT NULL

How Senior Engineers Fix It

  • Refresh statistics on all involved tables, especially R000:

    EXEC DBMS_STATS.GATHER_TABLE_STATS('SCHEMA', 'R000', estimate_percent => 100);
  • Force a new plan with a query rewrite or plan guide if needed:

    DBMS_XPLAN.DISPLAY_CURSOR(fcur => 'last', fplan => 'ALL');
  • Use a binder variable or opt_param to encourage a fresh optimiser estimation:

    ALTER SESSION SET optimizer_features_enable = '11.2.0.4';
  • Schedule regular stats jobs tied to data change thresholds.

  • Remove hidden filters (e.g., WHERE R000.FKLZ IS NOT NULL) and ensure the join semantics match the intended logic.


Why Juniors Miss It

  • Assuming stats are always accurate; they rarely check when encountering intermittent failures.
  • Overlooking shared pool plan caching; they may think each execution is independent.
  • Underestimating the impact of subtle WHERE clauses that interact with join predicates.
  • Not using tools like DBMS_XPLAN or SQL_TRACE to diagnose plan changes.

By systematically reviewing statistics, plan caching behavior, and join logic, senior engineers isolate and resolve such elusive one‑off query issues.

Leave a Comment