qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inconsistent block status reply in qemu-nbd in qemu-img-6.2.0


From: Kevin Wolf
Subject: Re: Inconsistent block status reply in qemu-nbd in qemu-img-6.2.0
Date: Mon, 17 Jan 2022 12:38:31 +0100

Am 17.01.2022 um 10:55 hat Hanna Reitz geschrieben:
> On 17.01.22 10:52, Kevin Wolf wrote:
> > Am 17.01.2022 um 09:46 hat Hanna Reitz geschrieben:
> > > On 16.01.22 19:09, Nir Soffer wrote:
> > > > On Sun, Jan 16, 2022 at 1:17 PM Nir Soffer <nsoffer@redhat.com> wrote:
> > > > > Some of our tests started to fail since qemu-img 6.2.0 released.
> > > > > 
> > > > > Here is an example failure:
> > > > > 
> > > > > $ truncate -s1g /data/scratch/empty-1g.raw
> > > > > 
> > > > > $ qemu-nbd -r -t -e8 -k /tmp/nbd.sock -A -f raw 
> > > > > /data/scratch/empty-1g.raw &
> > > > git bisect points to this commit:
> > > > 
> > > > $ git bisect good
> > > > 0bc329fbb009f8601cec23bf2bc48ead0c5a5fa2 is the first bad commit
> > > > commit 0bc329fbb009f8601cec23bf2bc48ead0c5a5fa2
> > > > Author: Hanna Reitz <hreitz@redhat.com>
> > > > Date:   Thu Aug 12 10:41:44 2021 +0200
> > > > 
> > > >       block: block-status cache for data regions
> > > > 
> > > > The commit message says:
> > > > 
> > > >       (Note that only caching data regions but not zero regions means 
> > > > that
> > > >       returning false information from the cache is not catastrophic: 
> > > > Treating
> > > >       zeroes as data is fine.  While we try to invalidate the cache on 
> > > > zero
> > > >       writes and discards, such incongruences may still occur when 
> > > > there are
> > > >       other processes writing to the image.)
> > > > 
> > > > I don't think it is fine to report zero as data. This can cause severe
> > > > performance
> > > > issues when users copy unneeded zero extents over the network, instead 
> > > > of
> > > > doing no work with zero bandwidth.
> > > You’re right, it was meant as a contrast to whether this might lead to
> > > catastrophic data failure.
> > > 
> > > But it was also meant as a risk estimation.  There wasn’t supposed to be a
> > > situation where the cache would report zeroes as data (other than with
> > > external writers), and so that’s why I thought it’d be fine.  But, well,
> > > things are supposed to be one way, and then you (me) write buggy code...
> > > 
> > > Thanks for the reproducer steps, I can reproduce the bug with your script
> > > (not with nbdinfo fwiw) and the problem seems to be this:
> > > 
> > > diff --git a/block/io.c b/block/io.c
> > > index bb0a254def..83e94faeaf 100644
> > > --- a/block/io.c
> > > +++ b/block/io.c
> > > @@ -2498,7 +2498,8 @@ static int coroutine_fn
> > > bdrv_co_block_status(BlockDriverState *bs,
> > >                * the cache requires an RCU update, so double check here to
> > > avoid
> > >                * such an update if possible.
> > >                */
> > > -            if (ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
> > > +            if (want_zero &&
> > > +                ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
> > >                   QLIST_EMPTY(&bs->children))
> > >               {
> > >                   /*
> > > 
> > > We should update the cache only when we have accurate zero information, so
> > > only if want_zero was true.
> > Either that, or store the want_zero value as well and compare it before
> > using the cache.
> > 
> > Depends on whether we need the optimisation for want_zero=false cases,
> > too. I think this mostly means bdrv_is_allocated(_above)(), which seems
> > to be relevant for the commit and stream jobs and 'qemu-img rebase' at
> > least and some less commonly used stuff. There are some other cases of
> > is_allocated (like in mirror) where it doesn't make a difference because
> > we always process the maximum number of bytes with the first call.
> 
> I think we need the cache only for want_zero=true cases, because allocation
> information is handled by the format layer, which doesn’t use the cache at
> all.  Also, the reason for the cache was to skip slow SEEK_HOLE/DATA calls
> on the protocol level, which is a want_zero=true case.

Hm, good point. But we would still call the protocol driver with
want_zero=false for raw images.

Obviously, this removes the commit case because it calls is_allocated
only on the overlay and that can't be raw. However, streaming (or more
specifically the copy-on-read driver) does seem to include the base
image in its bdrv_is_allocated_above(), which may be raw.

So it looks to me as if the optimisation were still relevant when you're
streaming from a raw base image into a qcow2 overlay.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]