qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inconsistent block status reply in qemu-nbd in qemu-img-6.2.0


From: Hanna Reitz
Subject: Re: Inconsistent block status reply in qemu-nbd in qemu-img-6.2.0
Date: Mon, 17 Jan 2022 09:46:08 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0

On 16.01.22 19:09, Nir Soffer wrote:
On Sun, Jan 16, 2022 at 1:17 PM Nir Soffer <nsoffer@redhat.com> wrote:
Some of our tests started to fail since qemu-img 6.2.0 released.

Here is an example failure:

$ truncate -s1g /data/scratch/empty-1g.raw

$ qemu-nbd -r -t -e8 -k /tmp/nbd.sock -A -f raw /data/scratch/empty-1g.raw &
git bisect points to this commit:

$ git bisect good
0bc329fbb009f8601cec23bf2bc48ead0c5a5fa2 is the first bad commit
commit 0bc329fbb009f8601cec23bf2bc48ead0c5a5fa2
Author: Hanna Reitz <hreitz@redhat.com>
Date:   Thu Aug 12 10:41:44 2021 +0200

     block: block-status cache for data regions

The commit message says:

     (Note that only caching data regions but not zero regions means that
     returning false information from the cache is not catastrophic: Treating
     zeroes as data is fine.  While we try to invalidate the cache on zero
     writes and discards, such incongruences may still occur when there are
     other processes writing to the image.)

I don't think it is fine to report zero as data. This can cause severe
performance
issues when users copy unneeded zero extents over the network, instead of
doing no work with zero bandwidth.

You’re right, it was meant as a contrast to whether this might lead to catastrophic data failure.

But it was also meant as a risk estimation.  There wasn’t supposed to be a situation where the cache would report zeroes as data (other than with external writers), and so that’s why I thought it’d be fine.  But, well, things are supposed to be one way, and then you (me) write buggy code...

Thanks for the reproducer steps, I can reproduce the bug with your script (not with nbdinfo fwiw) and the problem seems to be this:

diff --git a/block/io.c b/block/io.c
index bb0a254def..83e94faeaf 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2498,7 +2498,8 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,               * the cache requires an RCU update, so double check here to avoid
              * such an update if possible.
              */
-            if (ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
+            if (want_zero &&
+                ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
                 QLIST_EMPTY(&bs->children))
             {
                 /*

We should update the cache only when we have accurate zero information, so only if want_zero was true.

Do we have a way to disable this cache, at least for nbd?

No, unfortunately not.  We could implement it, but I think the management layer would need to specify it for all protocol nodes where it wants the cache to be disabled.

Auto-detecting would be difficult for qemu, because it would basically mean detecting whether there are any exports in the block graph somewhere above the node in question (not even immediately above it, because for example in your example there’s a raw node between the export and the protocol node whose block-status information we’re caching).

I assume you’re now very skeptical of the cache, so even if we implement a fix for this problem, you’ll probably still want that option to disable it, right?

Hanna

Nir




$ nbdinfo --map nbd+unix:///?socket=/tmp/nbd.sock
          0  1073741824    3  hole,zero

$ python
Python 3.10.1 (main, Dec  9 2021, 00:00:00) [GCC 11.2.1 20211203 (Red
Hat 11.2.1-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.

Enable debug logs so we can see all nbd commands and replies...

import logging
logging.basicConfig(level=logging.DEBUG)
Connect to qemu-nbd:

from ovirt_imageio._internal import nbd
c = nbd.Client(nbd.UnixAddress("/tmp/nbd.sock"))
DEBUG:nbd:Connecting address='/tmp/nbd.sock' export_name='' dirty=False
DEBUG:nbd:Received server flags: 3
DEBUG:nbd:Sending client flags: 1
DEBUG:nbd:Sending option: 8 data: b''
DEBUG:nbd:Received reply magic=3e889045565a9 option=8 type=1 len=0
DEBUG:nbd:Structured reply enabled
DEBUG:nbd:Sending option: 10 data:
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x0fbase:allocation\x00\x00\x00\x15qemu:allocation-depth')
DEBUG:nbd:Received reply magic=3e889045565a9 option=10 type=4 len=19
DEBUG:nbd:Meta context base:allocation is available id=0
DEBUG:nbd:Received reply magic=3e889045565a9 option=10 type=4 len=25
DEBUG:nbd:Meta context qemu:allocation-depth is available id=1
DEBUG:nbd:Received reply magic=3e889045565a9 option=10 type=1 len=0
DEBUG:nbd:Sending option: 7 data: bytearray(b'\x00\x00\x00\x00\x00\x00')
DEBUG:nbd:Received reply magic=3e889045565a9 option=7 type=3 len=14
DEBUG:nbd:Received block size info minimum=1 preferred=4096 maximum=33554432
DEBUG:nbd:Received reply magic=3e889045565a9 option=7 type=3 len=12
DEBUG:nbd:Received export info size=1073741824 flags=1423
DEBUG:nbd:Received reply magic=3e889045565a9 option=7 type=1 len=0
DEBUG:nbd:Ready for transmission

Get extents (first call):

c.extents(0, 1024**3)
DEBUG:nbd:Sending NBD_CMD_BLOCK_STATUS handle=0 offset=0
length=1073741824 flags=0
DEBUG:nbd:Extent length=1073741824 flags=3 context=0
DEBUG:nbd:Extent length=1073741824 flags=1 context=1
{'base:allocation': [Extent(length=1073741824, flags=3)],
'qemu:allocation-depth': [Extent(length=1073741824, flags=0)]}

We got the flags = 3  (NBD_STATE_HOLE | NBD_STATE_ZERO)

But in the next cal...

c.extents(0, 1024**3)
DEBUG:nbd:Sending NBD_CMD_BLOCK_STATUS handle=1 offset=0
length=1073741824 flags=0
DEBUG:nbd:Extent length=1073741824 flags=0 context=0
DEBUG:nbd:Extent length=1073741824 flags=1 context=1
{'base:allocation': [Extent(length=1073741824, flags=0)],
'qemu:allocation-depth': [Extent(length=1073741824, flags=0)]}

We got flags=0 (data)

Trying to the same with a real image, using this reproducer script:

$ cat reporoduce.py
import logging
import pprint
from ovirt_imageio._internal import nbd

logging.basicConfig(level=logging.DEBUG)

with nbd.Client(nbd.UnixAddress("/tmp/nbd.sock")) as c:
     print("First call: c.extents(0, 10*1024**2) ->")
     pprint.pprint(c.extents(0, 10*1024**2))

     print("Second call: c.extents(0, 10*1024**2) ->")
     pprint.pprint(c.extents(0, 10*1024**2))


$ python reporoduce.py
DEBUG:nbd:Connecting address='/tmp/nbd.sock' export_name='' dirty=False
DEBUG:nbd:Received server flags: 3
DEBUG:nbd:Sending client flags: 1
DEBUG:nbd:Sending option: 8 data: b''
DEBUG:nbd:Received reply magic=3e889045565a9 option=8 type=1 len=0
DEBUG:nbd:Structured reply enabled
DEBUG:nbd:Sending option: 10 data:
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x0fbase:allocation\x00\x00\x00\x15qemu:allocation-depth')
DEBUG:nbd:Received reply magic=3e889045565a9 option=10 type=4 len=19
DEBUG:nbd:Meta context base:allocation is available id=0
DEBUG:nbd:Received reply magic=3e889045565a9 option=10 type=4 len=25
DEBUG:nbd:Meta context qemu:allocation-depth is available id=1
DEBUG:nbd:Received reply magic=3e889045565a9 option=10 type=1 len=0
DEBUG:nbd:Sending option: 7 data: bytearray(b'\x00\x00\x00\x00\x00\x00')
DEBUG:nbd:Received reply magic=3e889045565a9 option=7 type=3 len=14
DEBUG:nbd:Received block size info minimum=1 preferred=4096 maximum=33554432
DEBUG:nbd:Received reply magic=3e889045565a9 option=7 type=3 len=12
DEBUG:nbd:Received export info size=6442450944 flags=1423
DEBUG:nbd:Received reply magic=3e889045565a9 option=7 type=1 len=0
DEBUG:nbd:Ready for transmission
First call: c.extents(0, 10*1024**2) ->
DEBUG:nbd:Sending NBD_CMD_BLOCK_STATUS handle=0 offset=0 length=10485760 flags=0
DEBUG:nbd:Extent length=4096 flags=0 context=0
DEBUG:nbd:Extent length=1044480 flags=3 context=0
DEBUG:nbd:Extent length=65536 flags=0 context=0
DEBUG:nbd:Extent length=983040 flags=3 context=0
DEBUG:nbd:Extent length=24576 flags=0 context=0
DEBUG:nbd:Extent length=16384 flags=3 context=0
DEBUG:nbd:Extent length=8192 flags=0 context=0
DEBUG:nbd:Extent length=12288 flags=3 context=0
DEBUG:nbd:Extent length=8327168 flags=0 context=0
DEBUG:nbd:Extent length=10485760 flags=1 context=1
{'base:allocation': [Extent(length=4096, flags=0),
                      Extent(length=1044480, flags=3),
                      Extent(length=65536, flags=0),
                      Extent(length=983040, flags=3),
                      Extent(length=24576, flags=0),
                      Extent(length=16384, flags=3),
                      Extent(length=8192, flags=0),
                      Extent(length=12288, flags=3),
                      Extent(length=8327168, flags=0)],
  'qemu:allocation-depth': [Extent(length=10485760, flags=0)]}
Second call: c.extents(0, 10*1024**2) ->
DEBUG:nbd:Sending NBD_CMD_BLOCK_STATUS handle=1 offset=0 length=10485760 flags=0
DEBUG:nbd:Extent length=10485760 flags=0 context=0
DEBUG:nbd:Extent length=10485760 flags=1 context=1
{'base:allocation': [Extent(length=10485760, flags=0)],
  'qemu:allocation-depth': [Extent(length=10485760, flags=0)]}
DEBUG:nbd:Initiating a soft disconnect
DEBUG:nbd:Sending NBD_CMD_DISC handle=2 offset=0 length=0 flags=0


All extents squashed to one data extent in the second call.

Testing show that the issue is with calling block status gain for the
same range.
Call for new range return the expected results, calling with same range returns
always flags=0.

For testing with ovirt-Imageio you can get it from copr:
https://copr.fedorainfracloud.org/coprs/nsoffer/ovirt-imageio-preview/

     dnf copr enable nsoffer/ovirt-imageio-preview

But should be reproducible with libnbd.

Looks like a bad bug in 6.2.0.

Nir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]