July 1, 2025

Addressing MySQL Replication Lag Due to Time Drift

Investigate and resolve MySQL replication lag anomalies triggered by time discrepancies between primary and secondary servers. Learn practical solutions for maintaining accurate Seconds_Behind_Master metrics.

Introduction

As database administrators, addressing replication lag is a common challenge. This article examines an unusual lag pattern observed in a production environment, where delays spiked and recovered abruptly.

Problem Phenomenon

Monitoring exposed an irregular replication lag pattern:
https://via.placeholder.com/600x400?text=Replication+Lag+Visualization

Key observations:

  1. Sudden Spikes: Lag surged to ~71 seconds before dropping to 0 within seconds.
  2. Periodicity: The pattern repeated every few minutes, resembling a "sawtooth" curve.

Using a shell script to log Seconds_Behind_Master every second:

for i in {1..300}; do
    echo `date` `mysql -S /data/mysql/3306/data/mysqld.sock -uxxx -pxxx -e 'show slave status\G' | grep Second` >> /tmp/second.log
    sleep 1
done

The log revealed oscillations between 0 and 71:

2023-04-17 14:20:00 0  
2023-04-17 14:20:01 71  
2023-04-17 14:20:02 0  
...

Root Cause Investigation

Initial theories pointed to resource contention or query overload, but the abrupt lag fluctuations dismissed these. Deeper analysis identified:

  1. Time Drift: The secondary server’s clock lagged behind the primary by ​71 seconds.
  2. Seconds_Behind_Master Calculation:
    • MySQL computes lag by comparing the secondary’s system time against the primary’s binary log timestamps.
    • If the secondary’s clock adjusts after the IO thread initializes, Seconds_Behind_Master becomes unreliable.

Critical Insight:

"Time discrepancies post-IO thread startup will skew Seconds_Behind_Master calculations." — MySQL Documentation

Solution

To resolve the issue:

  1. Synchronize System Time:
    • Deploy NTP to align all cluster nodes’ clocks.
    • Avoid abrupt time changes; use gradual adjustments (e.g., ntpd instead of manual date commands).
  2. Restart the IO Thread:
    • After time alignment, restart the secondary’s IO thread to recalculate the time offset.
  3. Operational Precautions:
    • Halt application writes during time adjustments if dependent on NOW() or CURDATE().
    • Validate changes in a staging environment first.

Example Workflow:

-- On Secondary Server
STOP SLAVE IO_THREAD;
-- Gradually adjust system time (if necessary)
START SLAVE IO_THREAD;

Key Takeaways

  • Time Consistency: Critical for accurate replication monitoring.
  • Metrics Validation: Cross-check Seconds_Behind_Master with binary log positions and SHOW SLAVE STATUS.
  • Change Coordination: Align time adjustments with application teams to prevent data inconsistencies.

For further details, refer to MySQL’s documentation on replication timing mechanisms.

You will get best features of ChatDBA