qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Recent qemu patch results in aio failures with host DASD disks resulting


From: Christian Borntraeger
Subject: Recent qemu patch results in aio failures with host DASD disks resulting in guest I/O errors
Date: Thu, 23 Sep 2021 11:18:31 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0

Am 22.09.21 um 21:51 schrieb Halil Pasic:
On Mon, 6 Sep 2021 16:24:20 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

On Fri, 25 Jun 2021 16:18:12 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

bs->sg is only true for character devices, but block devices can also
be used with scsi-block and scsi-generic.  Unfortunately BLKSECTGET
returns bytes in an int for /dev/sgN devices, and sectors in a short
for block devices, so account for that in the code.

The maximum transfer also need not be a power of 2 (for example I have
seen disks with 1280 KiB maximum transfer) so there's no need to pass
the result through pow2floor.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

We have found that this patch leads to in guest I/O errors when DASD
is used as a source device. I.e. libvirt domain xml wise something like:

     <disk type='block' device='disk'>
       <driver name='qemu' type='raw' cache='none' io='native' iothread='1'/>
       <source dev='/dev/disk/by-id/ccw-XXXXXXX'/>
       <backingStore/>
       <target dev='vdb' bus='virtio'/>
       <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0008'/>
     </disk>

I don't think it is the fault of this patch: it LGTM. But it correlates
100%, furthermore the problem seems to be related to the value of
bl.max_iov which now comes from sysfs.

We are still investigating what is actually wrong. Just wanted to give
everybody a heads-up that this does seem to cause a nasty regression on
s390x, even if the code itself is perfect.


We have figured out what is going on here. The problem seems to be
specific to linux aio, which seems to limit the size of the iovec to
1024 (UIO_MAXIOV).

Because of this requests get rejected with -EINVAL by the io_submit()
syscall. Here comes a real world example:

(gdb) p *laiocb
$5 = {co = 0x3ff700072c0, ctx = 0x3fe60002650, iocb = {data = 0x0, aio_rw_flags 
= 0, key = 0,
     aio_lio_opcode = 8, aio_reqprio = 0, aio_fildes = 38, u = {c = {buf = 
0x3ff70055bc0,
         nbytes = 1274, offset = 19977953280, __pad3 = 0, flags = 1, resfd = 
43}, v = {
         vec = 0x3ff70055bc0, nr = 0, offset = 19977953280}, poll = {__pad1 = 
1023,
         events = 1879399360}, saddr = {addr = 0x3ff70055bc0, len = 0}}}, ret = 
-22,
   nbytes = 20586496, qiov = 0x3ff700382f8, is_read = false, next = {sqe_next = 
0x0}}

On the host kernel side, the -EINVAL comes from this line:
https://elixir.bootlin.com/linux/v5.15-rc2/source/lib/iov_iter.c#L1863
in iovec_from_user() roughly via: __do_sys_io_submit()->
io_submit_one() -> aio_write() -> aio_setup_rw() -> __import_iovec().

Based on the offline discussion with the DASD maintainers, and on the
linux in source tree documentation (Documentation/block/queue-sysfs.rst
und Documentation/block/biodoc.rst), I believe that the DASD driver is
not wrong when advertising the value 65535 for queue/max_segments.

I believe QEMU jumps to the wrong conclusion in blk_get_max_iov() or
in virtio_blk_submit_multireq(), I can't really tell because I'm not
sure what the semantic of blk_get_max_iov() is. But if, I had to, I would
bet that blk_get_max_iov() returns the wrong value, when linux aio is
used. I'm not sure what is the exact relationship between max_segments
and max_iov.

One idea on how to fix this would be, to cap max_iov to UIO_MAXIOV
(unconditionally, or when linux aio is used). But I have to admit, I
don't have clarity. I couldn't even find documentation that states
this limitation of linux aio (I didn't look for it too hard though), so
I can not even be sure this is a QEMU problem.

That is why I decided to write this up first, and start a discussion on
who is playing foul with the most relevant people posted. I intend to try
myself fixing the problem once my understanding of it reaches a
sufficient level.

Thanks in advance for your contribution!

Changing subject to reflect what we see.
For reference the QEMU patch is

https://git.qemu.org/?p=qemu.git;a=commitdiff;h=18473467d55a20d643b6c9b3a52de42f705b4d35;hp=24b36e9813ec15da7db62e3b3621730710c5f020



reply via email to

[Prev in Thread] Current Thread [Next in Thread]