Background
In GTID-based replication, discrepancies between Primary and Secondary servers can occur silently, even when replication status appears healthy (Slave_IO_Running: Yes
and Slave_SQL_Running: Yes
). This article explores a real-world data loss incident caused by improper GTID configuration, emphasizing the critical role of AUTO_POSITION=1
.
Symptoms
- Data Inconsistency: Primary and Secondary databases diverge, but replication metrics show no errors.
- Misconfigured GTID Sets: Secondary’s
Executed_Gtid_Set
exceeds the Primary’s due to manualGTID_PURGE
manipulation. - Silent Failures: DML operations on the Primary (e.g.,
INSERT
) are not replicated, yet replication threads remain active.
Reproduction Steps
1. Reset Secondary:
STOP SLAVE;
RESET SLAVE ALL;
RESET MASTER;
2. Force GTID Mismatch:
SET @@GLOBAL.GTID_PURGED = 'c233aec0-58d3-11ec-a74a-a0a33ba8b3ff:1-1000000';
3. Start Replication in POSITION MODE:
CHANGE MASTER TO MASTER_HOST='primary_host', MASTER_AUTO_POSITION=0;
START SLAVE;
Result: Slave_IO_Running
and Slave_SQL_Running
both show Yes
, but data is not replicated.
Root Cause Analysis
- Missing
AUTO_POSITION=1
: Without this setting, GTID validation relies onMASTER_LOG_FILE
/POSITION
, allowing stale data to persist. - GTID Set Overlap: Secondary’s
Executed_Gtid_Set
(up to 1,000,000) surpassed the Primary’s (104,345), causing replication to ignore newer transactions. - Binary Log Truncation Risk: If the Primary’s binary log is lost (e.g., disk failure), the Secondary may discard valid transactions.
Solutions
1. Enable AUTO_POSITION=1
:
CHANGE MASTER TO MASTER_AUTO_POSITION=1;
This forces GTID-based synchronization, ensuring consistency across Retrieved_Gtid_Set
and Executed_Gtid_Set
.
2. Avoid Manual GTID Manipulation:
Never truncate GTID history unless you have a full data backup. Use MASTER_AUTO_POSITION=1
to prevent silent failures.
3. Monitor GTID Gaps:
Regularly check:
SHOW SLAVE STATUS\G
Ensure Retrieved_Gtid_Set
and Executed_Gtid_Set
align without gaps.
4. Rebuild Secondary if Corrupted:
In severe cases (e.g., partial replication), restore from a full backup and re-sync using mysqldump
or Percona XtraBackup
.
Key Takeaways
- GTID Consistency is Non-Negotiable: Always use
AUTO_POSITION=1
in GTID-enabled environments. - Silent Failures Are Deadly: Monitor replication lag and GTID sets proactively.
- Disaster Recovery Planning: Test failover scenarios to validate GTID alignment.