July 25, 2025

MySQL Hangs on Startup with systemd

Explore why MySQL hangs during systemd startup, troubleshooting steps, and solutions to resolve the issue efficiently.

ntroduction
As the title suggests, in an automated testing scenario, MySQL fails to start via systemd. The process of killing the instance with kill -9 and checking if mysqld is correctly restarted is continuous. The specific details are as follows:

Host Information: CentOS 8 (Docker Container)
Using systemd to manage the mysqld process
Systemd service running mode: forking
Startup command:

sudo -S systemctl start mysqld_11690.service

ExecStart command in the systemd service:

/opt/mysql/base/8.0.34/bin/mysqld --defaults-file=/opt/mysql/etc/11690/my.cnf --daemonize --pid-file=/opt/mysql/data/11690/mysqld.pid --user=actiontech-mysql --socket=/opt/mysql/data/11690/mysqld.sock --port=11690

Symptoms
The startup command hangs indefinitely, neither succeeding nor returning any output. Attempts to manually reproduce the issue are unsuccessful. The service port number in the screenshot is inconsistent and can be ignored.

The MySQL error log shows no information. Checking the systemd service status reveals that the startup script fails due to missing parameters, specifically the MAIN PID.

The final output from systemd is: "New main PID 31036 does not exist or is a zombie."

Root Cause
During the startup of mysqld via systemd, the following steps are executed:

  1. ExecStart (start mysqld)
  2. mysqld creates a pid file
  3. ExecStartPost (custom post-start scripts: adjusting permissions, writing pid to cgroup, etc.)

Between steps 2 and 3, an automated test command is issued: sudo -S kill -9 $(cat /opt/mysql/data/11690/mysqld.pid). Since the pid file and the process exist (otherwise, the kill command or cat would fail), the automated test case assumes the kill operation is successful. However, from systemd's perspective, it still needs to wait for step 3 to complete before considering the service as started.

In forking mode, systemd determines if the service has started successfully based on the child process's PID. If the child process starts successfully and does not exit unexpectedly, systemd considers the service started and uses the child process's PID as the MAIN PID. If the child process fails to start or exits unexpectedly, systemd considers the service as not started.

Conclusion
During the execution of ExecStartPost, the child process ID 31036 is killed, and the post-start shell lacks the startup parameters. Although ExecStart is completed, the MAIN PID 31036 becomes a zombie process in systemd.

Troubleshooting Process
When encountering this issue, it was initially confusing. Basic checks on memory and disk information showed no resource shortages.

First, check the MySQL error log for any clues. The log shows no useful information as there are no log entries after the startup time point.

Check the systemctl status to confirm the service's current state:

  • The post-start shell fails due to the missing -p parameter (the -p parameter is the MAIN PID, i.e., the PID of the forked child process).
  • Systemd cannot retrieve PID 31036, which does not exist or is a zombie process.

Further checks on the process ID and mysqld.pid file confirm:

  • PID 31036 does not exist.
  • The mysqld.pid file exists and contains the value 31036.
  • The top command shows no zombie processes.

More clues are needed to confirm the cause. Check the journalctl -u content for any helpful information:

sh-4.4# journalctl -u mysqld_11690.service
-- Logs begin at Mon 2024-02-05 04:00:35 CST, end at Mon 2024-02-05 17:08:01 CST. --
Feb 05 05:07:54 udp-11 systemd[1]: Starting MySQL Server...
Feb 05 05:07:56 udp-11 systemd[1]: Started MySQL Server.
Feb 05 05:08:31 udp-11 systemd[1]: mysqld_11690.service: Main process exited, code=killed, status=9/KILL
Feb 05 05:08:31 udp-11 systemd[1]: mysqld_11690.service: Failed with result 'signal'.
Feb 05 05:08:32 udp-11 systemd[1]: Starting MySQL Server...
Feb 05 05:08:36 udp-11 systemd[1]: Started MySQL Server.
Feb 05 05:08:37 udp-11 systemd[1]: mysqld_11690.service: Main process exited, code=killed, status=9/KILL
Feb 05 05:08:37 udp-11 systemd[1]: mysqld_11690.service: Failed with result 'signal'.
Feb 05 05:08:39 udp-11 systemd[1]: Starting MySQL Server...
Feb 05 05:08:42 udp-11 u_set_iops.sh[31507]: /etc/systemd/system/mysqld_11690.service.d/u_set_iops.sh: option requires an argument -- p
Feb 05 05:08:42 udp-11 systemd[1]: mysqld_11690.service: New main PID 31036 does not exist or is a zombie.

The journalctl -u content only describes the symptoms and does not provide specific reasons, similar to the systemctl status content, offering little help.

Check the /var/log/messages system log for any memory-related error messages. After some research, it is found that these errors might be hardware-related. After consulting with the automated testing team, it is concluded that:

  • The scenario is intermittent, with 2 out of 4 test cases succeeding and 2 failing.
  • Each test case runs on the same host and container image.
  • The container that hangs during failure is the same.

Since there are successful executions, hardware issues are temporarily ignored.

Considering the use of containers, cgroup mapping issues are suspected. From the systemctl status, the cgroup mapping directory is:

CGroup: /docker/3a72b2cdc7bd9beb1c7b2abec24763046604602a38f0fcb7406d17f5d33353d2/system.slice/mysqld_11690.service

Checking the read-write permissions of the parent folder system.slice shows no abnormalities. Cgroup mapping issues are temporarily ruled out (as other systemd-managed services on the host use the same cgroup).

Using pstack to see where systemd hangs, with the PID of systemctl start being 3048143:

sh-4.4# pstack 3048143
#0  0x00007fdfaef33ade in ppoll () from /lib64/libc.so.6
#1  0x00007fdfaf7768ee in bus_poll () from /usr/lib/systemd/libsystemd-shared-239.so
#2  0x00007fdfaf6a8f3d in bus_wait_for_jobs () from /usr/lib/systemd/libsystemd-shared-239.so
#3  0x000055b4c2d59b2e in start_unit ()
#4  0x00007fdfaf7457e3 in dispatch_verb () from /usr/lib/systemd/libsystemd-shared-239.so
#5  0x000055b4c2d4c2b4 in main ()

The start_unit function seems suspicious, but it is located in the executable file and is used to start systemd units, offering no help.

Based on the available clues, it is inferred that:

  • The existence of the mysqld.pid file indicates that a mysqld process with PID 31036 was started.
  • The process was killed by the automated test case.
  • Systemd retrieves the already terminated MAIN PID, causing the post-start shell to fail and the fork process to fail.

By the systemd startup process, it is speculated that:

  • MySQL only generates the mysqld.pid file after the mysqld process has successfully started.
  • The process might have been killed unexpectedly in a subsequent step.

Reproduction Steps
Since there are no other clues, the inferred conclusion is tested by attempting to reproduce the issue.

4.1 Adjust the systemd mysql service template
Edit the template file /etc/systemd/system/mysqld_11690.service to add a sleep 10 command after mysqld starts, allowing a time window to simulate killing the instance process.

4.2 Reload the configuration
Execute systemctl daemon-reload to apply the changes.

4.3 Reproduce the scenario
[SSH Session A] Prepare a new container, configure it, and execute sudo -S systemctl start mysqld_11690.service to start the mysqld process. The session will hang due to the sleep command.
[SSH Session B] While the start command is hanging, check the mysqld.pid file. Once the file is created, immediately execute sudo -S kill -9 $(cat /opt/mysql/data/11690/mysqld.pid).
Observe the systemctl status, which matches the expected behavior.

Solution
First, kill the hanging systemctl start command and execute systemctl stop mysqld_11690.service to allow systemd to terminate the zombie process. Although the stop command might report an error, it does not affect the outcome.
Wait for the stop command to complete and then use the start command to restart the service, which should return to normal.

You will get best features of ChatDBA