October 15, 2025

Tuning PostgreSQL 18's Asynchronous I/O (AIO) for Performance

This guide provides practical tuning advice for the new AIO subsystem in PostgreSQL 18, focusing on optimizing io_methodand io_workerssettings for different workloads.

PostgreSQL 18 has been officially released, packed with numerous improvements. One major architectural change is Asynchronous I/O (AIO), which enables the asynchronous scheduling of I/O operations. This grants the database better control over storage resources and improves storage utilization. This article will not delve deeply into how AIO works or present exhaustive benchmark results. Its primary goal is to share tuning recommendations for AIO in PostgreSQL 18 and explain some inherent, non-obvious trade-offs and limitations. Ideally, these tuning suggestions should be incorporated into the official documentation, but that requires a clear consensus based on practical experience. As a new feature, AIO currently lacks sufficient real-world validation data. Although extensive benchmarks were conducted during development to set the default parameters, this cannot replace the experience of actual production systems. Therefore, this article will discuss how to (possibly) adjust the default parameters and the trade-offs involved, based on personal experience.

io_method/ io_workers

There is a series of parameters related to AIO (or I/O in general). However, you likely only need to focus on these two introduced in Postgres 18:

io_method = worker (options: sync, io_uring)
io_workers = 3

‍

Other parameters (like io_combine_limit) have reasonable defaults. I don't have strong recommendations for tuning them yet, so it's best to keep them as-is for now. This article will focus on these two key parameters.

io_method

The io_methoddetermines how AIO requests are actually handled – which process performs the I/O and how it is scheduled. It has three possible values:

sync- This is a "backwards-compatible" option, using synchronous I/O with posix_fadvicewhere supported. This prefetches data into the page cache, not the shared buffers.
worker- Creates a pool of "I/O worker processes" to perform the actual I/O. When a backend process needs to read a block from a data file, it inserts a request into a queue in shared memory. An I/O worker process is woken up, performs the preadoperation, places the data into the shared buffers, and notifies the backend process.
io_uring- Each backend process has an io_uringinstance (a pair of queues) and uses it to perform I/O. The difference from workeris that instead of executing preaddirectly, it submits requests via io_uring.

The default value is io_method = worker. We did consider making either syncor io_uringthe default, but I believe workerwas the correct choice. It is truly "asynchronous" and works everywhere (since it's our own implementation). syncwas considered as a "fallback" option in case we encountered issues during the beta/RC phase. But we didn't have problems, and it's unclear if using syncwould even be helpful, as it still goes through the AIO infrastructure. You can still use syncif you want to simulate the behavior of older versions.

io_uringis a popular method for asynchronous I/O (not just for disks). It is excellent, efficient, and lightweight. However, it is Linux-specific, and we need to support many platforms. We could have used platform-specific defaults (similar to wal_sync_method), but that seemed unnecessarily complex.

Note: Even on Linux, validating io_uringcan be tricky. Some container runtimes (e.g., containerd) previously disabled io_uringsupport due to security risks.

No single io_methodoption is "universally optimal." There will always be workloads where A is better than B, and vice versa. Ultimately, we hope most systems will use and benefit from AIO, and we wanted to keep things simple, so we chose worker.

💡 Suggestion: My recommendation is to stick with io_method = workerand adjust the io_workersvalue (described in the next section).

io_workers

Postgres defaults are very conservative. It can even start on small machines like a Raspberry Pi. However, on the other hand, this conservative configuration performs poorly on typical database servers, which usually have more RAM/CPU. To get good performance on such large machines, you need to tune some parameters (shared_buffers, max_wal_size, etc.). I wish we had an automated way to choose "appropriate" initial values for these basic parameters, but it's more difficult than it seems. It largely depends on the context (e.g., other things might be running on the same system). At least there are tools like PGTune that provide reasonable recommendations for these parameters. This also applies to the default value of io_workers = 3, which only creates 3 I/O worker processes. This might be acceptable for a small machine with 8 cores, but it is definitely insufficient for a machine with 128 cores.

I can demonstrate this with results from a benchmark I ran to help select the default io_method. This benchmark generated a synthetic dataset and then ran queries matching parts of the data (while forcing the use of a specific scan type).

Note: This benchmark (along with scripts, numerous results, and a more detailed explanation) was initially shared in a pgsql-hackers mailing list thread about the default io_method. Please refer to that thread for more details and feedback from others. The results shown are from a small workstation with a Ryzen 9900X (12 cores / 24 threads) and 4 NVMe SSDs (configured in RAID0).

The following chart compares query execution times for different io_methodoptions:

(Chart description: Each color represents a different io_methodvalue (17 stands for "Postgres 17"). For the "worker" method, there are two data series corresponding to different numbers of worker processes (3 and 12). This is for two datasets: "uniform" - uniform distribution (so I/O is completely random), and "linear_10" - sequential distribution with a bit of randomness (imperfect correlation).)

The chart shows some very interesting phenomena:

Index Scan: The io_methodhas no impact, which is understandable because index scans do not yet use AIO (all I/O is synchronous).
Bitmap Scan: The behavior is more chaotic. The workermethod performs best, but only when there are 12 worker processes. With the default 3 worker processes, its performance is actually poor for low-selectivity queries.
Sequential Scan: There is a clear difference between methods. workeris the fastest, about twice as fast as sync(and PG17). io_uringfalls in between.

In a chart with a logarithmic scale on the Y-axis, the performance disadvantage of the workermode with io_workers=3in bitmap scan scenarios is more evident: the configuration with io_workers=3is consistently the slowest (this is almost imperceptible in the linear chart).

The good news is that while I/O worker processes are not free, their overhead is not excessive. Therefore, having too many workers is generally better than having too few. In the future, we might start/stop worker processes on demand, making them "adaptive." This would allow us to always maintain an optimal number of processes. There is even a patch in progress for this, but it wasn't included in Postgres 18.

Suggestion: Consider increasing io_workers. There isn't an ideal recommended value or formula yet, but perhaps setting it to about 1/4 of the number of CPU cores is a viable option?

Trade-offs

A one-size-fits-all optimal configuration does not exist. I have seen suggestions to "use io_uringfor maximum efficiency," but the benchmark above clearly shows that for sequential scans, io_uringis significantly slower than worker. Don't get me wrong, I recognize that io_uringis an excellent interface, and the aforementioned suggestion is not "wrong." Any performance tuning advice is inherently a simplification; there will always be counterexamples. The real world is never as simple as advice suggests: the core meaning of such advice is to use a concise rule to掩盖 the underlying complexities.

So, what are the trade-offs and differences between these asynchronous I/O methods?

Bandwidth

A major difference between io_uringand workerlies in where the tasks are executed. For io_uring, all tasks are executed within the backend process itself; for worker, these tasks are handled in separate processes. This can have noteworthy implications for bandwidth, depending on the overhead of processing the I/O. This overhead can be significant because it involves:

The actual I/O operation
Checksum verification (enabled by default in Postgres 18)
Copying data into the shared buffers

For io_uring, all of this happens within the backend process itself. The I/O part might be more efficient, but the checksum verification and memory copying (memcpy) steps can become performance bottlenecks. For worker, this work is effectively distributed among the worker processes. If you have 1 backend process and 3 worker processes, the limit is increased by a factor of 3. Of course, the converse is also true. With 16 connections, for io_uring, that's 16 processes that can verify checksums, etc. For worker, the limit is the value set for io_workers. This is why I suggest setting io_workersto about 25% of the core count. I think it could even be set higher, possibly up to one I/O worker per core. In any case, 3 seems clearly too low.

Note: I believe this ability to spread the overhead across multiple processes is the reason workeroutperforms io_uringon sequential scans. A difference of around 20% seems plausible for checksum verification and memory copying in this benchmark.

Signaling

Another important detail is the overhead of inter-process communication (IPC) between backend processes and I/O worker processes, which is based on UNIX signals. The execution flow for a single I/O operation is as follows:

The backend process adds a read request to a queue in shared memory.
The backend process signals an I/O worker process to wake it up.
The I/O worker process performs the I/O requested by the backend and copies the data into the shared buffers.
The I/O worker process signals the backend process to notify it that the I/O is complete.

In the worst case, this means one "round-trip signal" (2 signals in total) is required for every 8KB data block processed. The problem is that signaling is not "zero-cost" – there is a limit to the number of signals a process can handle per second. I wrote a simple benchmark to test the performance of signal passing between two processes. On my machine, the results showed it could reach 250,000 to 500,000 round trips per second. If each 8KB block requires one round trip, this translates to a transfer rate of only 2-4 GB/s. This is not particularly fast, especially considering the data might already be in the page cache, not just cold data read from storage. According to a test copying data from the page cache, a single process can achieve 10-20 GB/s, which is about 4 times faster than the signaling method. Clearly, signaling could become a performance bottleneck.

Note: The specific limits vary by hardware and can be much lower on older machines. But this general observation held true on all machines I had access to.

The good news is that this only affects "worst-case" workloads that require reading 8KB pages one by one. Most regular workloads are not like this. Backends often find many buffers in shared memory (thus requiring no I/O). Or, due to read-ahead, I/O happens in larger chunks, amortizing the signaling cost over multiple blocks. Therefore, I don't consider this a serious issue likely to arise frequently. There is a longer discussion about the overhead of AIO (not just due to signaling) in the mailing list thread about index prefetching.

File Limits

io_uringdoes not require any IPC, so it is not subject to signaling overhead or similar issues. However, io_uringalso has its own limitations, just in different places. For instance, each process is subject to "per-process bandwidth limits" (e.g., how much memory copying a single process can perform). But judging by the page cache test, these limits are quite high – around 10-20 GB/s. Another consideration is that io_uringmight require a considerable number of file descriptors. As explained in this pgsql-hackers thread:

The issue is that with io_uring, we need to create one file descriptor (FD) per possible child process so that one backend process can wait for I/O initiated by another backend to complete. These io_uringinstances need to be created in the postmaster so that all backends can access them. Obviously, if max_connectionsis set high, this helps hit the unadjusted soft RLIMIT_NOFILElimit faster.

Therefore, if you decide to use io_uring, you might also need to adjust ulimit -n.

Note: This is not the only place in the Postgres code where you might hit file descriptor limits. About a year ago, I proposed a patch idea related to file descriptor caching. Each backend keeps open file descriptors up to max_files_per_process, which is set to 1000 by default. This was sufficient in the past, but with partitioning (or per-tenant schemas), it's easy to trigger frequent and costly open/close calls. That is a separate but similar issue.

Summary

AIO is a major architectural change in PostgreSQL 18, but it currently has limitations: it only supports read operations, and some operations still rely on the old synchronous I/O mechanism. These limitations are not permanent and are expected to be addressed step by step in future versions. Based on the analysis in this article, the final AIO tuning recommendations are as follows:

Keep the default io_method = worker: Unless benchmarking proves io_uringis superior for your specific workload, switching is not recommended. Use synconly if you need to simulate PostgreSQL 17 behavior (even if it may lead to performance degradation in some scenarios).
Adjust io_workersbased on CPU cores: Start with a configuration of about 25% of the core count, and consider increasing it up to 100% in I/O-intensive scenarios.

If you discover interesting conclusions during your tuning process, feel free to provide feedback to the author, and it is even more recommended to post your experiences to the pgsql-hackers mailing list. These experiences will help improve the tuning recommendations in the official documentation in the future.

‍

You will get best features of ChatDBA

Try Free Trial Learn More

Tuning PostgreSQL 18's Asynchronous I/O (AIO) for Performance

More From Blog

Mastering SQL_MODE in MySQL: A Comprehensive Guide

What Should Be Done for MySQL Database Inspection?

The Most Comprehensive Interpretation of New Features in MySQL 8.0 - Part 1

You will get best features of ChatDBA