qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH] virtio-blk: dataplane: release AioContext befor


From: Sergio Lopez
Subject: Re: [Qemu-block] [PATCH] virtio-blk: dataplane: release AioContext before blk_set_aio_context
Date: Thu, 28 Feb 2019 16:01:19 +0100
User-agent: NeoMutt/20180716

On Wed, Feb 27, 2019 at 06:37:14PM +0100, Kevin Wolf wrote:
> Am 27.02.2019 um 17:52 hat Sergio Lopez geschrieben:
> > Stopping the dataplane requires calling to blk_set_aio_context, which
> > may need to wait for a running job to be completed or paused.
> > 
> > As stopping the dataplane is something that can be triggered from a vcpu
> > thread (due to the Guest requesting to stop the device), while the job
> > itself may be managed by an iothread, holding the AioContext will lead
> > to a deadlock, where the first one is waiting for the job to pause or
> > finish, while the second can't make any progress as it's waiting for the
> > AioContext to be released:
> > 
> >  Thread 6 (LWP 90472)
> >  #0  0x000055a6f8497aee in blk_root_drained_end
> >  #1  0x000055a6f84a7926 in bdrv_parent_drained_end
> >  #2  0x000055a6f84a799f in bdrv_do_drained_end
> >  #3  0x000055a6f84a82ab in bdrv_drained_end
> >  #4  0x000055a6f8498be8 in blk_drain
> >  #5  0x000055a6f84a22cd in mirror_drain
> >  #6  0x000055a6f8457708 in block_job_detach_aio_context
> >  #7  0x000055a6f84538f1 in bdrv_detach_aio_context
> >  #8  0x000055a6f8453a96 in bdrv_set_aio_context
> >  #9  0x000055a6f84999f8 in blk_set_aio_context
> >  #10 0x000055a6f81802c8 in virtio_blk_data_plane_stop
> >  #11 0x000055a6f83cfc95 in virtio_bus_stop_ioeventfd
> >  #12 0x000055a6f83d1f81 in virtio_pci_common_write
> >  #13 0x000055a6f83d1f81 in virtio_pci_common_write
> >  #14 0x000055a6f8148d62 in memory_region_write_accessor
> >  #15 0x000055a6f81459c9 in access_with_adjusted_size
> >  #16 0x000055a6f814a608 in memory_region_dispatch_write
> >  #17 0x000055a6f80f1f98 in flatview_write_continue
> >  #18 0x000055a6f80f214f in flatview_write
> >  #19 0x000055a6f80f6a7b in address_space_write
> >  #20 0x000055a6f80f6b15 in address_space_rw
> >  #21 0x000055a6f815da08 in kvm_cpu_exec
> >  #22 0x000055a6f8132fee in qemu_kvm_cpu_thread_fn
> >  #23 0x000055a6f8551306 in qemu_thread_start
> >  #24 0x00007f9bdf5b9dd5 in start_thread
> >  #25 0x00007f9bdf2e2ead in clone
> > 
> >  Thread 8 (LWP 90467)
> >  #0  0x00007f9bdf5c04ed in __lll_lock_wait
> >  #1  0x00007f9bdf5bbde6 in _L_lock_941
> >  #2  0x00007f9bdf5bbcdf in __GI___pthread_mutex_lock
> >  #3  0x000055a6f8551447 in qemu_mutex_lock_impl
> >  #4  0x000055a6f854be77 in co_schedule_bh_cb
> >  #5  0x000055a6f854b781 in aio_bh_poll
> >  #6  0x000055a6f854b781 in aio_bh_poll
> >  #7  0x000055a6f854f01b in aio_poll
> >  #8  0x000055a6f825a488 in iothread_run
> >  #9  0x000055a6f8551306 in qemu_thread_start
> >  #10 0x00007f9bdf5b9dd5 in start_thread
> >  #11 0x00007f9bdf2e2ead in clone
> > 
> >  (gdb) thread 8
> >  [Switching to thread 8 (Thread 0x7f9bd6dae700 (LWP 90467))]
> >  #3  0x000055a6f8551447 in qemu_mutex_lock_impl (mutex=0x55a6fac8fea0,
> >      file=0x55a6f8730c3f "util/async.c", line=511) at 
> > util/qemu-thread-posix.c:66
> >  66     err = pthread_mutex_lock(&mutex->lock);
> >  (gdb) up 3
> >  #3  0x000055a6f8551447 in qemu_mutex_lock_impl (mutex=0x55a6fac8fea0,
> >      file=0x55a6f8730c3f "util/async.c", line=511) at 
> > util/qemu-thread-posix.c:66
> >  66     err = pthread_mutex_lock(&mutex->lock);
> >  (gdb) p mutex.lock.__data.__owner
> >  $6 = 90472
> > 
> > Signed-off-by: Sergio Lopez <address@hidden>
> > ---
> >  hw/block/dataplane/virtio-blk.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/hw/block/dataplane/virtio-blk.c 
> > b/hw/block/dataplane/virtio-blk.c
> > index 8c37bd314a..358e6ae89b 100644
> > --- a/hw/block/dataplane/virtio-blk.c
> > +++ b/hw/block/dataplane/virtio-blk.c
> > @@ -280,12 +280,11 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
> >  
> >      aio_context_acquire(s->ctx);
> >      aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> > +    aio_context_release(s->ctx);
> >  
> >      /* Drain and switch bs back to the QEMU main loop */
> >      blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
> >  
> > -    aio_context_release(s->ctx);
> > -
> 
> blk_set_aio_context() requires that the caller hold the lock for the
> source context, so I'm afraid this is wrong.

TBH, I was quite sure this patch was wrong myself, but I thought it was
still a good way to illustrate the issue.

> However, I think the problem might already be fixed with my "block: Use
> normal drain for bdrv_set_aio_context()" (contained in a pull request,
> which isn't merged yet), which makes bdrv_set_aio_context() use a real
> drain operation. This way, the requests of the mirror jobs should be
> already drained before we even call into block_job_detach_aio_context().
> 
> Can you give this a try and see whether it fixes your problem?

I've applied your patchset to my local copy, but it doesn't fix the
issue.

The problem is the coroutine is already scheduled to be run in the
iothread context, which means job->busy == true, so we can't switch to
it from any other place.

I think we should give it a chance to run. What'd do you think of
something like this:

diff --git a/blockjob.c b/blockjob.c
index 58de8cb..1695bef 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -135,9 +135,8 @@ static void block_job_detach_aio_context(void *opaque)

     job_pause(&job->job);

-    while (!job->job.paused && !job_is_completed(&job->job)) {
-        job_drain(&job->job);
-    }
+    AIO_WAIT_WHILE(job->job.aio_context,
+        (job_drain(&job->job), !job->job.paused && 
!job_is_completed(&job->job)));

     job->job.aio_context = NULL;
     job_unref(&job->job);

Thanks,
Sergio (slp).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]