Summary
pg_upgrade failed when upgrading a PostgreSQL server from v14 to v16 due to incompatible extensions (pgvector and hydra) causing schema restoration issues during the upgrade process.
Root Cause
The root cause was inconsistent schema definitions between PostgreSQL v14 and v16 for the installed extensions (pgvector and hydra). Specifically, the attislocal attribute in the pg_attribute catalog table was not properly handled during the binary upgrade process.
Why This Happens in Real Systems
- Extension Incompatibility: Extensions like pgvector and hydra may not be fully compatible across major PostgreSQL versions.
- Schema Changes: PostgreSQL v16 introduced changes in how inherited columns and catalog tables are handled, leading to conflicts during binary upgrades.
- Insufficient Testing: Pre-upgrade checks do not always detect subtle schema inconsistencies introduced by extensions.
Real-World Impact
- Downtime: The failed upgrade caused extended downtime for the database server.
- Data Integrity Risk: Partial upgrades can leave the database in an inconsistent state, risking data corruption.
- Resource Waste: Time and effort spent on troubleshooting and rollback.
Example or Code (if necessary and relevant)
-- Example of conflicting schema update during pg_upgrade
UPDATE pg_catalog.pg_attribute
SET attislocal = false
WHERE attname = 'dspp_conflict_cd'
AND attrelid = '"big_cust"."purchase_extension_1_82"'::regclass;
How Senior Engineers Fix It
- Verify Extension Compatibility: Ensure all extensions are compatible with the target PostgreSQL version.
- Manual Schema Migration: Perform manual schema adjustments before running
pg_upgrade. - Use Logical Backup: Consider using
pg_dumpandpg_restorefor logical upgrades instead of binary upgrades. - Test Thoroughly: Run
pg_upgradein a staging environment with identical extensions and schema.
Why Juniors Miss It
- Overreliance on Checks: Assuming all pre-upgrade checks guarantee success without considering extension compatibility.
- Lack of Experience: Limited exposure to major version upgrades and extension-related pitfalls.
- Insufficient Logging Analysis: Failing to correlate error logs with specific schema or extension issues.