qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] mirror: Confirm we're quiesced only if the j


From: Sergio Lopez
Subject: Re: [Qemu-devel] [PATCH v2] mirror: Confirm we're quiesced only if the job is paused or cancelled
Date: Fri, 08 Mar 2019 16:45:17 +0100
User-agent: mu4e 1.0; emacs 26.1

Kevin Wolf writes:

> Am 07.03.2019 um 19:54 hat Sergio Lopez geschrieben:
>> While child_job_drained_begin() calls to job_pause(), the job doesn't
>> actually transition between states until it runs again and reaches a
>> pause point. This means bdrv_drained_begin() may return with some jobs
>> using the node still having 'busy == true'.
>> 
>> As a consequence, block_job_detach_aio_context() may get into a
>> deadlock, waiting for the job to be actually paused, while the coroutine
>> servicing the job is yielding and doesn't get the opportunity to get
>> scheduled again. This situation can be reproduced by issuing a
>> 'block-commit' immediately followed by a 'device_del'.
>> 
>> To ensure bdrv_drained_begin() only returns when the jobs have been
>> paused, we change mirror_drained_poll() to only confirm it's quiesced
>> when job->paused == true and there aren't any in-flight requests, except
>> if we reached that point by a drained section initiated by the
>> mirror/commit job itself.
>> 
>> The other block jobs shouldn't need any changes, as the default
>> drained_poll() behavior is to only confirm it's quiesced if the job is
>> not busy or completed.
>> 
>> Signed-off-by: Sergio Lopez <address@hidden>
>> 
>> ---
>> v2
>>   - Fix typo (thanks to Eric Blake)
>> ---
>>  block/mirror.c | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>> 
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 726d3c27fb..1a1fb174b6 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -80,6 +80,7 @@ typedef struct MirrorBlockJob {
>>      bool initial_zeroing_ongoing;
>>      int in_active_write_counter;
>>      bool prepared;
>> +    bool in_drain;
>>  } MirrorBlockJob;
>>  
>>  typedef struct MirrorBDSOpaque {
>> @@ -679,9 +680,11 @@ static int mirror_exit_common(Job *job)
>>  
>>          /* The mirror job has no requests in flight any more, but we need to
>>           * drain potential other users of the BDS before changing the 
>> graph. */
>> +        s->in_drain = true;
>>          bdrv_drained_begin(target_bs);
>>          bdrv_replace_node(to_replace, target_bs, &local_err);
>>          bdrv_drained_end(target_bs);
>> +        s->in_drain = false;
>>          if (local_err) {
>>              error_report_err(local_err);
>>              ret = -EPERM;
>
> I think this hunk is wrong because this is nested: s->in_drain is
> already true before this block, so we're setting it to false too early.
> We can either drop is completely or just assert(s->in_drain).

You're right, I'll send a v3 with an assert instead of touching the
value there.

Thanks!
Sergio (slp).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]