Stale Oracle 11g stats causing empty results on first query run

Summary

Oracle 11g returned an empty result set on the first execution of a seemingly correct query, and only produced rows on subsequent runs.
The issue involved an inner join against table R000 and a conditional WHERE clause that later caused the optimizer to choose a different plan. Changing the join to a left outer join resolved the problem, but adding WHERE R000.FKLZ IS NOT NULL re‑introduced the failure.

The root cause was a stale optimizer statistic on R000 that led the planner to pick a plan that discarded the rows, followed by a “warm‑up” scenario where the same query generated a different plan after the first run.

Root Cause

Outdated or missing statistics on R000 caused the optimizer to misestimate cardinality.
The optimizer cache reused a plan that assumed a very low selectivity for the join condition.
After the first execution, the runtime statistics were refreshed implicitly, prompting a new plan that actually returned rows.
Adding a WHERE R000.FKLZ IS NOT NULL filtered out the hidden rows because the earlier plan had already eliminated them.

Why This Happens in Real Systems

Statistical drift: Production workloads change, but tables don’t get re‑stats frequently.
Cache behavior: Oracle may keep a plan for a query in the shared pool; subsequent runs use the cached plan until the cache is purged.
First‑run cold cache: The initial execution might not hit the costs expected because of uninitialized statistics.
Implicit plan changes: A small modification (e.g., changing an INNER JOIN to LEFT OUTER JOIN) can trigger a totally different plan.

Real-World Impact

Zero availability of critical data for the first run of a batch job.
Data inconsistency in downstream processes that depend on the query.
Unexpected downtime for dashboards or analytics that rely on the query.
Increased support load as operators investigate intermittent “no rows” errors.

Example or Code (if necessary and relevant)

SELECT RC201.FKLZ RC201_FKLZ,
       RC201.NR RC201_NR,
       -- ... other columns omitted for brevity ...
       R000.FKFS
FROM   RC201
JOIN   RC220 ON RC201.CUST_FPLZ = RC220.CUST_FPLZ
JOIN   R000 ON RC201.FKLZ = R000.FKLZ
WHERE  ( R000.FKFS < 7 AND
         (rc201.cust_fpfs IS NULL OR (rc201.cust_fpfs  0)) AND
         (rc220.cust_fpfs IS NULL OR (rc220.cust_fpfs  0))
       )
  OR ( RC201.cust_fpfs >= 7 AND active = 1 )
  OR ( rc220.cust_fpfs >= 7 AND active = 1 )
  OR ( R000.FKFS >= 7 AND active = 1 )
  AND RC201.CUST_FPFS != 2
  AND (rc201.fklz != 'UNPROD' or rc201.active = 1)
  -- problematic condition when uncommented
  -- AND R000.FKLZ IS NOT NULL

How Senior Engineers Fix It

Refresh statistics on all involved tables, especially R000:

EXEC DBMS_STATS.GATHER_TABLE_STATS('SCHEMA', 'R000', estimate_percent => 100);

Force a new plan with a query rewrite or plan guide if needed:
```
DBMS_XPLAN.DISPLAY_CURSOR(fcur => 'last', fplan => 'ALL');
```
Use a binder variable or opt_param to encourage a fresh optimiser estimation:
```
ALTER SESSION SET optimizer_features_enable = '11.2.0.4';
```
Schedule regular stats jobs tied to data change thresholds.
Remove hidden filters (e.g., WHERE R000.FKLZ IS NOT NULL) and ensure the join semantics match the intended logic.

Why Juniors Miss It

Assuming stats are always accurate; they rarely check when encountering intermittent failures.
Overlooking shared pool plan caching; they may think each execution is independent.
Underestimating the impact of subtle WHERE clauses that interact with join predicates.
Not using tools like DBMS_XPLAN or SQL_TRACE to diagnose plan changes.

By systematically reviewing statistics, plan caching behavior, and join logic, senior engineers isolate and resolve such elusive one‑off query issues.