|
From: | Paolo Bonzini |
Subject: | Re: [PATCH v2 2/6] file-posix: try BLKSECTGET on block devices too, do not round to power of 2 |
Date: | Mon, 31 May 2021 18:36:19 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 |
On 31/05/21 15:59, Kevin Wolf wrote:
Apparently the motivation for Maxim's patch was, if I'm reading the description correctly, that it affected non-sg cases by imposing unnecessary restrictions. I see that patch 1 changed the max_iov part so that it won't affect non-sg cases any more, but max_transfer could still be more restricted than necessary, no?Indeed the kernel puts no limit at all, but especially with O_DIRECT we probably benefit from avoiding the moral equivalent of "bufferbloat".Yeah, that sounds plausible, but on the other hand the bug report Maxim addressed was about performance issues related to buffer sizes being too small. So even if we want to have some limit, max_transfer of the host device is probably not the right one for the general case.
Yeah, for a simple dd with O_DIRECT there is no real max_transfer, and if you are willing to allocate a big enough buffer. Quick test on my laptop, reading 12.5 GiB:
163840 9.46777s 327680 9.41480s 520192 9.39520s (max_iov * 4K) 614400 9.06289s 655360 8.85762s 1310720 8.75502s 2621440 8.26522s 5242880 7.88319s 10485760 7.66751s 20971520 7.42627sIn practice using blktrace shows that virtual address space is fragmented enough that the cap for I/O operations is not max_transfer but max_iov * 4096 (as was before the series)... and yet the benefit effectively *begins* there because it's where the cost of the system calls is amortized over multiple kernel<->disk communications.
Things are probably more complicated if more than one I/O is in flight, and with async I/O instead of read/write, but still a huge part of performance is seemingly the cost of system calls (not just the context switch, also pinning the I/O buffer and all other ancillary costs).
So the solution is probably to add a max_hw_transfer limit in addition to max_transfer, and have max_hw_iov instead of max_iov to match.
Paolo
[Prev in Thread] | Current Thread | [Next in Thread] |