qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 00/13] block: remove aio_disable_external() API


From: Stefan Hajnoczi
Subject: Re: [PATCH 00/13] block: remove aio_disable_external() API
Date: Tue, 4 Apr 2023 17:04:42 -0400

On Tue, Apr 04, 2023 at 03:43:20PM +0200, Paolo Bonzini wrote:
> On 4/3/23 20:29, Stefan Hajnoczi wrote:
> > The aio_disable_external() API temporarily suspends file descriptor 
> > monitoring
> > in the event loop. The block layer uses this to prevent new I/O requests 
> > being
> > submitted from the guest and elsewhere between bdrv_drained_begin() and
> > bdrv_drained_end().
> > 
> > While the block layer still needs to prevent new I/O requests in drained
> > sections, the aio_disable_external() API can be replaced with
> > .drained_begin/end/poll() callbacks that have been added to BdrvChildClass 
> > and
> > BlockDevOps.
> > 
> > This newer .bdrained_begin/end/poll() approach is attractive because it 
> > works
> > without specifying a specific AioContext. The block layer is moving towards
> > multi-queue and that means multiple AioContexts may be processing I/O
> > simultaneously.
> > 
> > The aio_disable_external() was always somewhat hacky. It suspends all file
> > descriptors that were registered with is_external=true, even if they have
> > nothing to do with the BlockDriverState graph nodes that are being drained.
> > It's better to solve a block layer problem in the block layer than to have 
> > an
> > odd event loop API solution.
> > 
> > That covers the motivation for this change, now on to the specifics of this
> > series:
> > 
> > While it would be nice if a single conceptual approach could be applied to 
> > all
> > is_external=true file descriptors, I ended up looking at callers on a
> > case-by-case basis. There are two general ways I migrated code away from
> > is_external=true:
> > 
> > 1. Block exports are typically best off unregistering fds in 
> > .drained_begin()
> >     and registering them again in .drained_end(). The .drained_poll() 
> > function
> >     waits for in-flight requests to finish using a reference counter.
> > 
> > 2. Emulated storage controllers like virtio-blk and virtio-scsi are a little
> >     simpler. They can rely on BlockBackend's request queuing during drain
> >     feature. Guest I/O request coroutines are suspended in a drained 
> > section and
> >     resume upon the end of the drained section.
> 
> Sorry, I disagree with this.
> 
> Request queuing was shown to cause deadlocks; Hanna's latest patch is piling
> another hack upon it, instead in my opinion we should go in the direction of
> relying _less_ (or not at all) on request queuing.
> 
> I am strongly convinced that request queuing must apply only after
> bdrv_drained_begin has returned, which would also fix the IDE TRIM bug
> reported by Fiona Ebner.  The possible livelock scenario is generally not a
> problem because 1) outside an iothread you have anyway the BQL that prevents
> a vCPU from issuing more I/O operations during bdrv_drained_begin 2) in
> iothreads you have aio_disable_external() instead of .drained_begin().
> 
> It is also less tidy to start a request during the drained_begin phase,
> because a request that has been submitted has to be completed (cancel
> doesn't really work).
> 
> So in an ideal world, request queuing would not only apply only after
> bdrv_drained_begin has returned, it would log a warning and .drained_begin()
> should set up things so that there are no such warnings.

That's fine, I will give .drained_begin/end/poll() a try with virtio-blk
and virtio-scsi in the next revision.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]