qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 18/20] block/nbd: drop connection_co


From: Eric Blake
Subject: Re: [PULL 18/20] block/nbd: drop connection_co
Date: Wed, 2 Feb 2022 07:53:53 -0600
User-agent: NeoMutt/20211029-256-77b59a

On Wed, Feb 02, 2022 at 12:49:36PM +0100, Fabian Ebner wrote:
> Am 27.09.21 um 23:55 schrieb Eric Blake:
> > From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > 
> > OK, that's a big rewrite of the logic.
> > 
> > Pre-patch we have an always running coroutine - connection_co. It does
> > reply receiving and reconnecting. And it leads to a lot of difficult
> > and unobvious code around drained sections and context switch. We also
> > abuse bs->in_flight counter which is increased for connection_co and
> > temporary decreased in points where we want to allow drained section to
> > begin. One of these place is in another file: in nbd_read_eof() in
> > nbd/client.c.
> > 
> > We also cancel reconnect and requests waiting for reconnect on drained
> > begin which is not correct. And this patch fixes that.
> > 
> > Let's finally drop this always running coroutine and go another way:
> > do both reconnect and receiving in request coroutines.
> >
> 
> Hi,
> 
> while updating our stack to 6.2, one of our live-migration tests stopped
> working (backtrace is below) and bisecting led me to this patch.
> 
> The VM has a single qcow2 disk (converting to raw doesn't make a
> difference) and the issue only appears when using iothread (for both
> virtio-scsi-pci and virtio-block-pci).
> 
> Reverting 1af7737871fb3b66036f5e520acb0a98fc2605f7 (which lives on top)
> and 4ddb5d2fde6f22b2cf65f314107e890a7ca14fcf (the commit corresponding
> to this patch) in v6.2.0 makes the migration work again.
> 
> Backtrace:
> 
> Thread 1 (Thread 0x7f9d93458fc0 (LWP 56711) "kvm"):
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f9d9d6bc537 in __GI_abort () at abort.c:79
> #2  0x00007f9d9d6bc40f in __assert_fail_base (fmt=0x7f9d9d825128
> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5579153763f8
> "qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)",
> file=0x5579153764f9 "../io/channel.c", line=483, function=<optimized
> out>) at assert.c:92

Given that this assertion is about which aio context is set, I wonder
if the conversation at
https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg00096.html is
relevant; if so, Vladimir may already be working on the patch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]