Summary
Oracle 11g returned an empty result set on the first execution of a seemingly correct query, and only produced rows on subsequent runs.
The issue involved an inner join against table R000 and a conditional WHERE clause that later caused the optimizer to choose a different plan. Changing the join to a left outer join resolved the problem, but adding WHERE R000.FKLZ IS NOT NULL re‑introduced the failure.
The root cause was a stale optimizer statistic on R000 that led the planner to pick a plan that discarded the rows, followed by a “warm‑up” scenario where the same query generated a different plan after the first run.
Root Cause
- Outdated or missing statistics on
R000caused the optimizer to misestimate cardinality. - The optimizer cache reused a plan that assumed a very low selectivity for the join condition.
- After the first execution, the runtime statistics were refreshed implicitly, prompting a new plan that actually returned rows.
- Adding a
WHERE R000.FKLZ IS NOT NULLfiltered out the hidden rows because the earlier plan had already eliminated them.
Why This Happens in Real Systems
- Statistical drift: Production workloads change, but tables don’t get re‑stats frequently.
- Cache behavior: Oracle may keep a plan for a query in the shared pool; subsequent runs use the cached plan until the cache is purged.
- First‑run cold cache: The initial execution might not hit the costs expected because of uninitialized statistics.
- Implicit plan changes: A small modification (e.g., changing an INNER JOIN to LEFT OUTER JOIN) can trigger a totally different plan.
Real-World Impact
- Zero availability of critical data for the first run of a batch job.
- Data inconsistency in downstream processes that depend on the query.
- Unexpected downtime for dashboards or analytics that rely on the query.
- Increased support load as operators investigate intermittent “no rows” errors.
Example or Code (if necessary and relevant)
SELECT RC201.FKLZ RC201_FKLZ,
RC201.NR RC201_NR,
-- ... other columns omitted for brevity ...
R000.FKFS
FROM RC201
JOIN RC220 ON RC201.CUST_FPLZ = RC220.CUST_FPLZ
JOIN R000 ON RC201.FKLZ = R000.FKLZ
WHERE ( R000.FKFS < 7 AND
(rc201.cust_fpfs IS NULL OR (rc201.cust_fpfs 0)) AND
(rc220.cust_fpfs IS NULL OR (rc220.cust_fpfs 0))
)
OR ( RC201.cust_fpfs >= 7 AND active = 1 )
OR ( rc220.cust_fpfs >= 7 AND active = 1 )
OR ( R000.FKFS >= 7 AND active = 1 )
AND RC201.CUST_FPFS != 2
AND (rc201.fklz != 'UNPROD' or rc201.active = 1)
-- problematic condition when uncommented
-- AND R000.FKLZ IS NOT NULL
How Senior Engineers Fix It
-
Refresh statistics on all involved tables, especially
R000:EXEC DBMS_STATS.GATHER_TABLE_STATS('SCHEMA', 'R000', estimate_percent => 100); -
Force a new plan with a query rewrite or plan guide if needed:
DBMS_XPLAN.DISPLAY_CURSOR(fcur => 'last', fplan => 'ALL'); -
Use a binder variable or opt_param to encourage a fresh optimiser estimation:
ALTER SESSION SET optimizer_features_enable = '11.2.0.4'; -
Schedule regular stats jobs tied to data change thresholds.
-
Remove hidden filters (e.g.,
WHERE R000.FKLZ IS NOT NULL) and ensure the join semantics match the intended logic.
Why Juniors Miss It
- Assuming stats are always accurate; they rarely check when encountering intermittent failures.
- Overlooking shared pool plan caching; they may think each execution is independent.
- Underestimating the impact of subtle WHERE clauses that interact with join predicates.
- Not using tools like
DBMS_XPLANorSQL_TRACEto diagnose plan changes.
By systematically reviewing statistics, plan caching behavior, and join logic, senior engineers isolate and resolve such elusive one‑off query issues.