Phenomenon
A server monitoring alert indicated free memory <10%. Analysis via top
revealed:
top - 13:45:43 up 1835 days, 20:52, 2 users, load average: 0.02, 0.03, 0.05
Tasks: 210 total, 1 running, 208 sleeping, 1 stopped, 0 zombie
%Cpu(s): 0.5 us, 0.6 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 32780028 total, 905684 free, 19957900 used, 11916444 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 3448260 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2677 mysql 20 0 20.1g 15.1g 3392 S 0.0 48.2 430:17.58 mysqld
10549 polkitd 20 0 3277476 3.1g 632 S 0.3 9.9 146:47.24 redis-server
18183 root 20 0 877308 215868 1892 T 2.7 0.7 2736:45 xxxxxx
442 root 20 0 160244 93016 88552 S 0.3 0.3 314:14.86 systemd-journal
32537 root 20 0 731620 58360 54588 S 0.3 0.2 29:09.61 rsyslogd
Key Observations:
- Total memory: 32GB, used: 19GB, buffer/cache: 11GB, available: 3GB.
- Top consumers: MySQL (15.1GB) and Redis (3.1GB).
- Hidden memory usage:
buff/cache
only utilized 3GB, leaving 8GB unaccounted for.
Analysis
Further investigation with free
and df
commands:
# Memory breakdown
free -m
total used free shared buff/cache available
Mem: 32011 19490 881 8762 11639 3366
Swap: 0 0 0
# Filesystem usage
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 8.6G 7.1G 55% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
Critical Findings:
- **
/run
filesystem**: Consumed 8.6GB (55% of/run
capacity). - Session files explosion:
/run/systemd/users
contained 300,000+ hidden files (e.g.,.#00009iJ
).- Oldest file dated back to 2018, indicating zombie sessions from
crond
.
Root Cause:
Stale crond
sessions (triggered by periodic tasks) generated persistent session files in /run/systemd/users
, consuming memory through inode references. These files were never cleaned up, leading to gradual memory exhaustion.
Solution
Implemented a phased approach:
1. Short-Term Mitigation
Reduce MySQL memory footprint:
Lowered innodb_buffer_pool_size
to free 2GB for the OS:
SET GLOBAL innodb_buffer_pool_size = 16G; -- From 20G
Manual cleanup of stale sessions:Identified and terminated inactive sessions via loginctl
:
# List all sessions
loginctl list-sessions | awk '{print $3}' | sort | uniq -c
# Kill specific sessions (e.g., session ID 24597)
loginctl kill-session 24597
2. Long-Term Prevention
- Automated session cleanup:
Added a cron job to purge sessions older than 30 days:
0 3 * * * /usr/bin/find /run/systemd/users/ -type f -mtime +30 -exec rm {} \;
- Adjusted cron task scheduling:
Modifiedcrond
configurations to minimize unnecessary session creation.
Key Takeaways
- Memory accounting:
buff/cache
may not reflect all memory usage; inspect/run
and/tmp
for hidden allocations. - Session management: Regularly clean up orphaned
systemd
sessions to prevent inode bloat. - Monitoring gaps: Add alerts for
/run
filesystem usage and session counts.