May 26, 2025

Resolving MySQL Replication Lag from Large Transactions: Step-by-Step Fixes

Learn how to diagnose and fix MySQL replication lag with practical strategies. This guide covers analyzing large transactions, optimizing binlog size, and improving secondary server performance using real-world case studies.

Background

MySQL primary-secondary replication is critical for ensuring high availability and scalability. However, replication lag can lead to data inconsistencies and degrade failover capabilities. A common culprit is ​large transactions, especially those involving multi-table writes. This article analyzes a production incident caused by sharding operations and shares optimization insights.

Problem Statement

Developers implemented a sharding process: reading data from a single large table, processing it, and writing to 100 partitioned tables. This generated ​large transactions​ with multi-table writes, causing significant replication lag. Initial observations revealed:

  • Secondary server relay log size: ​1.6 GB​ (exceeding the default 1.1 GB).
  • INFORMATION_SCHEMA.INNODB_TRX showed active transactions with dynamically changing insert counts.
  • Replication lag: ​37,325 seconds​ (10.4 hours), with the secondary trailing 4 binary logs behind.

Troubleshooting Steps

1. ​Analyze Secondary Server Transactions
Checked INNODB_TRX for long-running transactions:

SELECT * FROM INFORMATION_SCHEMA.INNODB_TRX WHERE TIME > 120;  

2. ​Inspect Relay Log Size

Identified oversized relay logs using:

ls -lh /path/to/relaylogs/mysql-relay-bin.*  

The 1.6 GB log indicated excessive backlog.

​3. Parse Binary Logs for Row Changes

Used mysqlbinlog to analyze transaction impact:

mysqlbinlog --base64-output=decode-rows -vv mysql-bin.003731 | awk '{...}' > analysis.txt  

Output showed most transactions affected 1–4 rows, ruling out massive bulk operations.

4. Evaluate Transaction Sizes

Sorted transactions by size to pinpoint outliers:

mysqlbinlog ... | grep "^BEGIN" | awk '{...}' | sort -nr | head -n 20  

Two transactions exceeded ​700 MB each, totaling ~1.5 GB—matching the relay log size.

5. ​Visualize Large Transactions with my2sql

Leveraged my2sql to parse detailed transaction metadata:

./my2sql -user repuser -password ... -work-type stats -start-file mysql-bin.003731  

Results highlighted 8 transactions spanning 7 hours, each inserting 4 million rows across 100 tables.

Solutions

1. Split Large Transactions
Break multi-table writes into smaller batches (e.g., 10,000 rows per commit) to reduce lock contention and binary log generation. Example:

BEGIN;  
INSERT INTO db.t_sharding_01 (...) VALUES (...);  
-- Repeat for 10,000 rows  
COMMIT;  

2. Optimize Sharding Logic
Streamline data distribution to minimize inter-table dependencies. Implement parallel writes or batch processing where feasible.

3. ​Monitor and Alert
Set up alerts for:

  1. Relay log size exceeding 500 MB.
  2. Transactions running longer than 5 minutes.
    Use tools like Prometheus + Grafana or Percona Monitoring and Management (PMM).

Key Takeaways

  • Large transactions​ are a leading cause of replication lag.
  • Analyze binary logs with mysqlbinlog and my2sql to identify bottlenecks.
  • Prioritize transaction granularity and monitoring to maintain replication health.

You will get best features of ChatDBA