[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 0/3] linux-aio: limit the batch size to reduce queue latency
From: |
Stefano Garzarella |
Subject: |
[PATCH 0/3] linux-aio: limit the batch size to reduce queue latency |
Date: |
Wed, 7 Jul 2021 17:00:16 +0200 |
This series add a new `aio-max-batch` parameter to IOThread, and use it in the
Linux AIO backend to limit the batch size (number of request submitted to the
kernel through io_submit(2)).
Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger hardcoded
value") changed MAX_EVENTS from 128 to 1024, to increase the number of
in-flight requests. But this change also increased the potential maximum batch
to 1024 elements.
The problem is noticeable when we have a lot of requests in flight and multiple
queues attached to the same AIO context.
In this case we potentially create very large batches. Instead, when we have
a single queue, the batch is limited because when the queue is unplugged,
there is a call to io_submit(2).
In practice, io_submit(2) was called only when there are no more queues plugged
in or when we fill the AIO queue (MAX_EVENTS = 1024).
I run some benchmarks to choose 32 as default batch value for Linux AIO.
Below the kIOPS measured with fio running in the guest (average over 3 runs):
| master | with this series applied |
|687f9f7834e| maxbatch=8|maxbatch=16|maxbatch=32|maxbatch=64|
# queues | 1q | 4qs | 1q | 4qs | 1q | 4qs | 1q | 4qs | 1q | 4qs |
-- randread tests -|-----------------------------------------------------------|
bs=4k iodepth=1 | 193 | 188 | 204 | 198 | 194 | 202 | 201 | 213 | 195 | 201 |
bs=4k iodepth=8 | 241 | 265 | 247 | 248 | 249 | 250 | 257 | 269 | 270 | 240 |
bs=4k iodepth=64 | 216 | 202 | 257 | 269 | 269 | 256 | 258 | 271 | 254 | 251 |
bs=4k iodepth=128 | 212 | 177 | 267 | 253 | 285 | 271 | 245 | 281 | 255 | 269 |
bs=16k iodepth=1 | 130 | 133 | 137 | 137 | 130 | 130 | 130 | 130 | 130 | 130 |
bs=16k iodepth=8 | 130 | 137 | 144 | 137 | 131 | 130 | 131 | 131 | 130 | 131 |
bs=16k iodepth=64 | 130 | 104 | 137 | 134 | 131 | 128 | 131 | 128 | 137 | 128 |
bs=16k iodepth=128 | 130 | 101 | 137 | 134 | 131 | 129 | 131 | 129 | 138 | 129 |
1q = virtio-blk device with a single queue
4qs = virito-blk device with multi queues (one queue per vCPU - 4)
I reported only the most significant tests, but I also did other tests to
make sure there were no regressions, here the full report:
https://docs.google.com/spreadsheets/d/11X3_5FJu7pnMTlf4ZatRDvsnU9K3EPj6Mn3aJIsE4tI
Test environment:
- Disk: Intel Corporation NVMe Datacenter SSD [Optane]
- CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
- QEMU: qemu-system-x86_64 -machine q35,accel=kvm -smp 4 -m 4096 \
... \
-object iothread,id=iothread0,aio-max-batch=${MAX_BATCH} \
-device virtio-blk-pci,iothread=iothread0,num-queues=${NUM_QUEUES}
- benchmark: fio --ioengine=libaio --thread --group_reporting \
--number_ios=200000 --direct=1 --filename=/dev/vdb \
--rw=${TEST} --bs=${BS} --iodepth=${IODEPTH} --numjobs=16
Next steps:
- benchmark io_uring and use `aio-max-batch` also there
- make MAX_EVENTS parametric adding a new `aio-max-events` parameter
Comments and suggestions are welcome :-)
Thanks,
Stefano
Stefano Garzarella (3):
iothread: generalize iothread_set_param/iothread_get_param
iothread: add aio-max-batch parameter
linux-aio: limit the batch size using `aio-max-batch` parameter
qapi/misc.json | 6 ++-
qapi/qom.json | 7 +++-
include/block/aio.h | 12 ++++++
include/sysemu/iothread.h | 3 ++
block/linux-aio.c | 6 ++-
iothread.c | 82 ++++++++++++++++++++++++++++++++++-----
monitor/hmp-cmds.c | 2 +
util/aio-posix.c | 12 ++++++
util/aio-win32.c | 5 +++
util/async.c | 2 +
qemu-options.hx | 8 +++-
11 files changed, 131 insertions(+), 14 deletions(-)
--
2.31.1
- [PATCH 0/3] linux-aio: limit the batch size to reduce queue latency,
Stefano Garzarella <=