Summary
TRUNCATE in PostgreSQL removes all rows by dropping and recreating the heap, which bypasses the regular MVCC visibility rules. Consequently, transactions that already hold a snapshot taken before the TRUNCATE will see the table as empty, breaking the usual “snapshot‑isolation” guarantee.
Root Cause
- Heap recreation:
TRUNCATEdrops the underlying heap file and creates a new one, discarding all old tuple versions. - No new XIDs for old rows: The old tuples are not marked dead with a new transaction ID; they simply disappear.
- Snapshot taken earlier: Transactions that captured a snapshot prior to the
TRUNCATEstill expect the old heap to exist, but it has been replaced, so their view resolves to “no rows”. - Lack of a “visibility map” update: Regular DELETE/UPDATE writes visibility info to the MVCC system;
TRUNCATEskips this step.
Why This Happens in Real Systems
- Performance optimization: Dropping the heap is dramatically faster than deleting row‑by‑row, especially for large tables.
- Locking semantics:
TRUNCATEacquires anACCESS EXCLUSIVElock, which blocks new transactions but does not block those that already hold a snapshot. - Design trade‑off: PostgreSQL chooses speed and low WAL volume over strict MVCC safety for this command.
Real-World Impact
- Stale reads: Parallel queries started before the truncate may return an empty result set, leading to logical errors.
- Incorrect aggregations: Summaries that rely on a consistent view can miss data that existed at the start of the transaction.
- Application‑level race conditions: Services that assume “snapshot isolation = repeatable read” can observe phantom emptiness.
- Debugging difficulty: The effect is nondeterministic, appearing only when concurrent snapshots overlap with the truncate.
Example or Code (if necessary and relevant)
BEGIN; -- Transaction 1 takes a snapshot
SELECT COUNT(*) FROM big_table; -- sees 1,000,000 rows
-- In another session:
TRUNCATE big_table; -- drops heap, creates new empty one
-- Back to Transaction 1:
SELECT COUNT(*) FROM big_table; -- now returns 0, even though the snapshot was taken earlier
COMMIT;
How Senior Engineers Fix It
- Use DELETE instead of TRUNCATE when MVCC safety matters, despite the performance cost.
- Serialize access: Ensure no long‑running snapshots overlap the truncate, e.g., by:
- Issuing
TRUNCATEduring a maintenance window. - Using
LOCK TABLE … IN ACCESS EXCLUSIVE MODEand waiting for all existing transactions to finish.
- Issuing
- Apply logical replication or partition swapping: Load new data into a fresh partition and
DROPthe old one atomically. - Wrap in a transaction with
SET TRANSACTION ISOLATION LEVEL SERIALIZABLEonly when you can guarantee no concurrent readers. - Document the behavior clearly in code reviews and architecture docs to avoid accidental misuse.
Why Juniors Miss It
- Assume MVCC is universal: New engineers often think every DML respects snapshot isolation without exceptions.
- Overlook lock levels: They may not realize that
ACCESS EXCLUSIVEdoes not block existing snapshots. - Focus on performance: The speed advantage of
TRUNCATEmasks its semantic caveats. - Insufficient testing: Without reproducing concurrent‑snapshot scenarios, the bug remains hidden until production.