June 4, 2025

MySQL Resource Isolation with Cgroups: A DBA's Guide

Learn how to implement CPU, memory, I/O, and network isolation for MySQL instances using Linux Cgroups to prevent resource contention issues.

Background

In daily operations, we often face resource contention issues that affect cluster stability, requiring significant troubleshooting effort. To address this pain point, we explore Cgroups[1] for resource isolation – a solution widely used in containerization that can be adapted for MySQL management.

2. Implementing Resource Isolation with Cgroups

2.1 CPU Limitations

Three common modes:

  1. Priority: Higher-priority processes get more CPU time
  2. Usage Time: Sets maximum CPU time per cycle (equivalent to core allocation)
  3. Core Binding: Binds processes to specific CPU cores

2.2 Memory Limits

Cgroups enforce memory ceilings. Note:

  • Threshold must exceed current usage
  • Swap adds complexity (disabled in our production environment)

2.3 I/O Limitations

Two approaches:

  1. Bandwidth: Read/write throughput limits
  2. IOPS: Operations per second limits

Key Insight: Combine both for effective control. Example:

  • 200MB/s bandwidth limit + 5,000 IOPS prevents small/large file extremes from monopolizing resources

Warning: Identify block device numbers carefully (lsblk -o NAME,MAJ:MIN).

2.4 Network Limitations

Cgroups require tc for:

  • Egress: Native support
  • Ingress: Requires virtual NIC traffic forwarding

Critical: Always remove forwarding rules BEFORE deleting virtual NICs to avoid server disconnection.

3. Cgroups with MySQL

3.1 Test Environment

  • Hardware: 48-core CPU, 187GB RAM, CentOS 7.9
  • MySQL: 8.0.35 with innodb_buffer_pool_size=4G, innodb_flush_method=O_DIRECT
CREATE DATABASE sbtest;
CREATE USER sysbench@'10.%' IDENTIFIED BY 'sysbench';
GRANT ALL ON sbtest.* TO sysbench@'10.%';

3.2 CPU Restriction Tests

Unrestricted: ~2,940 TPS, ~58,800 QPS using 31 cores

for ((i=0;i<15;i++));do top -p 3614 -b -n 1;sleep 10;done|grep mysql

4-core Limit:

cgcreate -g cpu:/mysql23761
echo 100000 > /sys/fs/cgroup/cpu/mysql23761/cpu.cfs_period_us
echo 400000 > /sys/fs/cgroup/cpu/mysql23761/cpu.cfs_quota_us  # 4 cores
cgclassify -g cpu:mysql23761 3614

Result: ~843 TPS, ~16,861 QPS (71% reduction)

3.3 Memory Restriction

2GB Limit Test:

cgcreate -g memory:/mysql23761
echo 2G > /sys/fs/cgroup/memory/mysql23761/memory.limit_in_bytes
echo 0 > /sys/fs/cgroup/memory/mysql23761/memory.oom_control
cgclassify -g memory:mysql23761 36310

Observation: MySQL OOM-killed during large data load, as expected.

3.4 I/O Restrictions

10MB/s Write Bandwidth:

cgcreate -g blkio:/mysql23761
echo "8:16 10485760" >> /sys/fs/cgroup/blkio/mysql23761/blkio.throttle.write_bps_device
cgclassify -g blkio:mysql23761 47719

Result: Throughput dropped from 350MB/s to 12MB/s

500 IOPS Limit:

echo "8:16 500" >> /sys/fs/cgroup/blkio/mysql23761/blkio.throttle.write_iops_device

Note: Requires O_DIRECT in Cgroups v1 (page cache bypass)

3.5 Network Restrictions

Egress (100Mbit)​:

cgcreate -g net_cls:/mysql23761
echo 0x100001 > /sys/fs/cgroup/net_cls/mysql23761/net_cls.classid
tc qdisc add dev bond0 root handle 10: htb
tc class add dev bond0 parent 10: classid 10:1 htb rate 100mbit ceil 100mbit
tc filter add dev bond0 parent 10: protocol ip prio 10 handle 1: cgroup

Result: TPS dropped from 3,300 to 280

Ingress (50Mbit)​:

modprobe ifb numifbs=1
ip link set dev ifb0 up
tc qdisc add dev bond0 handle ffff: ingress
tc filter add dev bond0 parent ffff: protocol ip u32 match u32 0 0 action mirred egress redirect dev ifb0

Result: TPS reduced to 1,800

4. Key Findings

  1. I/O restrictions may cause high system load (20+ for single instance)
  2. Non-O_DIRECT I/O limits require Cgroups v2 (kernel ≥4.5)
  3. Comprehensive monitoring and emergency plans are essential for production use

References

[1] Cgroups: https://en.wikipedia.org/wiki/Cgroups
[2] O_DIRECT Mode: https://andrestc.com/post/cgroups-io/

You will get best features of ChatDBA