Summary
A SCD Type 2 merge is supposed to insert new dimension rows and close the old row when a change occurs.
When a merge is written incorrectly, an existing key from the staging table can be inserted a second time, creating duplicate active rows. The root cause is usually a faulty WHEN MATCHED condition that doesn’t exclude rows that are already current.
Root Cause
-
Ad-hoc merge predicate
TheWHEN MATCHEDclause often uses only the business key, e.g.AND d.key = s.key.
If the staging row exactly matches the current target row but the current row is already marked inactive, the merge will still treat it as a match and incorrectly insert a new row. -
Missing status check
No check on the target’s effective date or active flag allows duplicate “current” records to surface. -
Incorrect
UPDATE/INSERTorder
In some engines theUPDATEmay run after theINSERT, so a row that should be closed is first duplicated.
Why This Happens in Real Systems
- Legacy scripts were adapted from a simple
INSERTorUPDATEworkflow. - Performance focus leads developers to omit secondary predicates.
- Complex pipelines (e.g., ETL with multiple staging tables) make it hard to keep the merge logic concise.
- Debugging is difficult when the merge happens in a batch; duplicate rows are discovered only downstream.
Real-World Impact
- Data quality degradation – analytics see inflated counts and stale brand metrics.
- Storage waste – more rows increase disk usage and query latency.
- Audit trail confusion – wrong historical lineage breaks compliance checks.
- Client trust erosion – stakeholders report inaccurate reports.
Example or Code (if necessary and relevant)
No executable code is required for this explanation.
The article focuses on the logical flaw and how to correct it.
How Senior Engineers Fix It
-
Add an active‑flag or effective‑date check in the
WHEN MATCHEDclause:WHEN MATCHED AND T.is_active = 1 THEN UPDATE SET ... WHEN NOT MATCHED AND S.is_active = 1 THEN INSERT (...) -
Close old rows first by updating the
is_activeflag before inserting a new record:UPDATE Target SET is_active = 0 WHERE key = s.key AND is_active = 1 AND s.change_detected = 1; -
Use a CTE or temporary table to pre‑filter staging rows that differ from the current state.
-
Add a unique index on (
key,is_active) to prevent accidental duplicates at the database level.
Why Juniors Miss It
- Assumes merge works like an upsert and trusts defaults.
- Friending complex predicates feels unnecessary when the simple key equality “works”.
- Lack of testing for the no-change scenario – they test only insert and simple update cases.
- Underestimates impact of duplicate records in downstream analytics.
By structuring the merge logic to explicitly check the current row’s status and closing it before inserting a new one, senior engineers ensure clean SCD Type 2 behaviors and avoid silent duplicates.