# Postmortem: Incorrect Handling of NULL in SQL MAX Queries
## Summary
An incorrect SQL query attempted to retrieve records with the largest non-NULL value in column `fd`, falling back to NULL values when no non-NULL data existed. The query failed due to improper NULL handling in comparison logic, yielding no results when NULL values were present.
## Root Cause
- SQL's `WHERE fd = NULL` comparison fails because `NULL` comparisons always return `UNKNOWN` per SQL's ternary logic.
- The `COALESCE` wrapped a subquery returning `NULL`, making the outer query equivalent to `WHERE fd = NULL`.
- Using equality (`=`) for `NULL` checks is invalid; `IS NULL` must be used instead.
## Why This Happens in Real Systems
- Legacy databases allowing optional fields (e.g., `last_login_date`)
- Data import pipelines that leave fields unpopulated
- Evolving schemas where columns transition from required to optional
- Hybrid datasets mixing numeric strings (e.g., "100") and NULLs
## Real-World Impact
- **Silent data omission**: Queries returning unexpectedly empty results
- **Reporting flaws**: Missing records in critical business metrics
- **NullPointer risks**: Downstream application crashes when non-NULL results are assumed
- **Data corruption**: Accidental overwrites when "missing" records aren't processed
## Example or Code
### Faulty Original:
```sql
SELECT * FROM TB
WHERE fd = COALESCE((SELECT MAX(fd) FROM TB WHERE fd IS NOT NULL), NULL);
Corrected Version:
WITH MaxValue AS (
SELECT MAX(fd) AS max_fd FROM TB WHERE fd IS NOT NULL
)
SELECT * FROM TB
WHERE
(fd = (SELECT max_fd FROM MaxValue))
OR
((SELECT max_fd FROM MaxValue) IS NULL AND fd IS NULL);
How Senior Engineers Fix It
- Avoid inline aggregation: Separate MAX calculation using CTE/temporary table
- Explicit NULL branching: Handle NULL and non-NULL cases in mutually exclusive conditions
- Leverage EXISTS: Verify non-NULL presence before applying MAX
- Index optimization: Create filtered index on
fd WHERE fd IS NOT NULL
- Boundary testing: Validate behavior with datasets containing:
- Only NULLs
- Mixed NULLs and values
- Single non-NULL value
Why Juniors Miss It
- Misunderstanding of NULL as a value rather than “unknown”
- Assumption that
COALESCE magically handles all NULL contexts
- Over-reliance on simplistic examples lacking NULL datasets
- Forgetting that aggregate functions (like MAX) ignore NULLs by default
- Confusing
IS NULL with = NULL due to ORM abstraction layers