qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do n


From: Paolo Bonzini
Subject: Re: [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2
Date: Mon, 31 May 2021 18:36:19 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1

On 31/05/21 15:59, Kevin Wolf wrote:
Apparently the motivation for Maxim's patch was, if I'm reading the
description correctly, that it affected non-sg cases by imposing
unnecessary restrictions. I see that patch 1 changed the max_iov part so
that it won't affect non-sg cases any more, but max_transfer could still
be more restricted than necessary, no?

Indeed the kernel puts no limit at all, but especially with O_DIRECT we
probably benefit from avoiding the moral equivalent of "bufferbloat".

Yeah, that sounds plausible, but on the other hand the bug report Maxim
addressed was about performance issues related to buffer sizes being too
small. So even if we want to have some limit, max_transfer of the host
device is probably not the right one for the general case.

Yeah, for a simple dd with O_DIRECT there is no real max_transfer, and if you are willing to allocate a big enough buffer. Quick test on my laptop, reading 12.5 GiB:

   163840       9.46777s
   327680       9.41480s
   520192       9.39520s (max_iov * 4K)
   614400       9.06289s
   655360       8.85762s
   1310720      8.75502s
   2621440      8.26522s
   5242880      7.88319s
   10485760     7.66751s
   20971520     7.42627s

In practice using blktrace shows that virtual address space is fragmented enough that the cap for I/O operations is not max_transfer but max_iov * 4096 (as was before the series)... and yet the benefit effectively *begins* there because it's where the cost of the system calls is amortized over multiple kernel<->disk communications.

Things are probably more complicated if more than one I/O is in flight, and with async I/O instead of read/write, but still a huge part of performance is seemingly the cost of system calls (not just the context switch, also pinning the I/O buffer and all other ancillary costs).

So the solution is probably to add a max_hw_transfer limit in addition to max_transfer, and have max_hw_iov instead of max_iov to match.

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]