How to make a SQL query on a field which can be NULL and values, and only chose the largest value if possible

# Postmortem: Incorrect Handling of NULL in SQL MAX Queries

## Summary
An incorrect SQL query attempted to retrieve records with the largest non-NULL value in column `fd`, falling back to NULL values when no non-NULL data existed. The query failed due to improper NULL handling in comparison logic, yielding no results when NULL values were present.

## Root Cause
- SQL's `WHERE fd = NULL` comparison fails because `NULL` comparisons always return `UNKNOWN` per SQL's ternary logic.
- The `COALESCE` wrapped a subquery returning `NULL`, making the outer query equivalent to `WHERE fd = NULL`.
- Using equality (`=`) for `NULL` checks is invalid; `IS NULL` must be used instead.

## Why This Happens in Real Systems
- Legacy databases allowing optional fields (e.g., `last_login_date`)
- Data import pipelines that leave fields unpopulated
- Evolving schemas where columns transition from required to optional
- Hybrid datasets mixing numeric strings (e.g., "100") and NULLs

## Real-World Impact
- **Silent data omission**: Queries returning unexpectedly empty results
- **Reporting flaws**: Missing records in critical business metrics
- **NullPointer risks**: Downstream application crashes when non-NULL results are assumed
- **Data corruption**: Accidental overwrites when "missing" records aren't processed

## Example or Code

### Faulty Original:
```sql
SELECT * FROM TB 
WHERE fd = COALESCE((SELECT MAX(fd) FROM TB WHERE fd IS NOT NULL), NULL);

Corrected Version:

WITH MaxValue AS (
  SELECT MAX(fd) AS max_fd FROM TB WHERE fd IS NOT NULL
)
SELECT * FROM TB
WHERE 
  (fd = (SELECT max_fd FROM MaxValue)) 
  OR 
  ((SELECT max_fd FROM MaxValue) IS NULL AND fd IS NULL);

How Senior Engineers Fix It

  1. Avoid inline aggregation: Separate MAX calculation using CTE/temporary table
  2. Explicit NULL branching: Handle NULL and non-NULL cases in mutually exclusive conditions
  3. Leverage EXISTS: Verify non-NULL presence before applying MAX
  4. Index optimization: Create filtered index on fd WHERE fd IS NOT NULL
  5. Boundary testing: Validate behavior with datasets containing:
    • Only NULLs
    • Mixed NULLs and values
    • Single non-NULL value

Why Juniors Miss It

  • Misunderstanding of NULL as a value rather than “unknown”
  • Assumption that COALESCE magically handles all NULL contexts
  • Over-reliance on simplistic examples lacking NULL datasets
  • Forgetting that aggregate functions (like MAX) ignore NULLs by default
  • Confusing IS NULL with = NULL due to ORM abstraction layers