Fixing non‑deterministic DEFAULT values in PolarDB MySQL 8.0 DDL

Summary

A production deployment failed when attempting to add a column to a PolarDB MySQL 8.0.1 cluster using a dynamic expression as a DEFAULT value. The operation triggered a GTID consistency error, preventing the schema change. While similar operations may succeed on standard RDS instances, PolarDB’s specific implementation of Global Transaction Identifiers (GTID) and its distributed storage architecture impose stricter requirements on how DDL statements are replicated to ensure data integrity across all nodes.

Root Cause

The error is caused by the use of a non-deterministic default expression in an ALTER TABLE statement while enforce_gtid_consistency is enabled.

  • GTID Consistency: To ensure that a transaction can be safely reapplied on a replica, every statement must be deterministic.
  • Expression Defaults: When a column is added with a DEFAULT (expression), MySQL 8.0 allows the default to be a function or another column.
  • Replication Conflict: In PolarDB’s architecture, certain dynamic expressions are flagged as non-deterministic. If the expression produces different results on the primary and the read-replica (due to timing, environment variables, or sequence), the GTID chain is broken, leading to data divergence.
  • PolarDB vs. RDS: PolarDB utilizes a shared-storage architecture and a different replication log mechanism compared to standard RDS (which typically uses standard MySQL binlogs). PolarDB enforces stricter GTID checks to prevent “silent” corruption across its distributed compute nodes.

Why This Happens in Real Systems

In high-scale distributed databases, determinism is the bedrock of consistency.

  • Clock Skew: If a default value uses NOW(), the primary and replica might record slightly different timestamps if not handled via binary logs.
  • State Dependency: Referencing another column in a DEFAULT expression during an ALTER can be problematic if the table is being modified concurrently or if the replication stream processes the change out-of-order.
  • Distributed Metadata: PolarDB manages metadata across multiple compute nodes; allowing non-deterministic DDLs could lead to a state where different nodes perceive the table schema differently.

Real-World Impact

  • Deployment Blockers: CI/CD pipelines fail during migration scripts, halting feature releases.
  • Operational Friction: Engineers may attempt to disable enforce_gtid_consistency, which is extremely dangerous as it can lead to permanent replica lag or total data divergence.
  • Inconsistent Environments: A script that works in a local MySQL dev environment or a small RDS instance fails in the production PolarDB cluster, leading to “it works on my machine” syndromes.

Example or Code (if necessary and relevant)

The failing statement:

ALTER TABLE cluster ADD COLUMN `test11` LONGTEXT DEFAULT (api_config);

The compliant approach:

ALTER TABLE cluster ADD COLUMN `test11` LONGTEXT;
UPDATE cluster SET `test11` = api_config;

How Senior Engineers Fix It

Senior engineers avoid trying to “force” the database to accept non-deterministic DDLs. Instead, they decouple the schema change from the data population.

  1. Two-Step Migration: First, add the column without the default expression. Second, run a batched UPDATE to populate the values.
  2. Batching: For large tables, they do not run a single UPDATE. They update the data in chunks (e.g., 1000 rows at a time) to avoid long-running transactions and excessive lock contention.
  3. Application Logic: They implement a “fallback” in the application code: if the new column is NULL, the application treats it as the value of api_config until the migration is complete.
  4. Virtual Columns: If the value must always mirror another column, they use a Generated Column (AS (api_config) VIRTUAL), which is natively deterministic and supported.

Why Juniors Miss It

  • Over-reliance on Syntax: Juniors often focus on whether the SQL syntax is “correct” according to the manual, rather than how that syntax interacts with the underlying replication engine.
  • Ignoring the “Why” of GTID: They view GTID as a configuration toggle rather than a safety guarantee for distributed data.
  • Lack of Scale Experience: In small datasets, a single UPDATE statement is fast. In production, they miss the need for batching and throttling, which are critical when modifying millions of rows in a PolarDB cluster.

Leave a Comment