Re: [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close_all()

From:	Max Reitz
Subject:	Re: [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close_all()
Date:	Mon, 16 Nov 2015 17:15:21 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 12.11.2015 08:34, Fam Zheng wrote:
> On Mon, 11/09 23:39, Max Reitz wrote:
>> This patch rewrites bdrv_close_all(): Until now, all root BDSs have been
>> force-closed. This is bad because it can lead to cached data not being
>> flushed to disk.
>>
>> Instead, try to make all reference holders relinquish their reference
>> voluntarily:
>>
>> 1. All BlockBackend users are handled by making all BBs simply eject
>>    their BDS tree. Since a BDS can never be on top of a BB, this will
>>    not cause any of the issues as seen with the force-closing of BDSs.
>>    The references will be relinquished and any further access to the BB
>>    will fail gracefully.
>> 2. All BDSs which are owned by the monitor itself (because they do not
>>    have a BB) are relinquished next.
>> 3. Besides BBs and the monitor, block jobs and other BDSs are the only
>>    things left that can hold a reference to BDSs. After every remaining
>>    block job has been canceled, there should not be any BDSs left (and
>>    the loop added here will always terminate (as long as NDEBUG is not
>>    defined), because either all_bdrv_states will be empty or there will
>>    not be any block job left to cancel, failing the assertion).
>>
>> Signed-off-by: Max Reitz <address@hidden>
>> Reviewed-by: Kevin Wolf <address@hidden>
>> ---
>>  block.c | 45 +++++++++++++++++++++++++++++++++++++--------
>>  1 file changed, 37 insertions(+), 8 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index b935dff..e3d5aea 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1905,9 +1905,7 @@ static void bdrv_close(BlockDriverState *bs)
>>  {
>>      BdrvAioNotifier *ban, *ban_next;
>>  
>> -    if (bs->job) {
>> -        block_job_cancel_sync(bs->job);
>> -    }
>> +    assert(!bs->job);
>>  
>>      /* Disable I/O limits and drain all pending throttled requests */
>>      if (bs->throttle_state) {
>> @@ -1971,13 +1969,44 @@ static void bdrv_close(BlockDriverState *bs)
>>  void bdrv_close_all(void)
>>  {
>>      BlockDriverState *bs;
>> +    AioContext *aio_context;
>> +    int original_refcount = 0;
>>  
>> -    QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
>> -        AioContext *aio_context = bdrv_get_aio_context(bs);
>> +    /* Drop references from requests still in flight, such as canceled block
>> +     * jobs whose AIO context has not been polled yet */
>> +    bdrv_drain_all();
>>  
>> -        aio_context_acquire(aio_context);
>> -        bdrv_close(bs);
>> -        aio_context_release(aio_context);
>> +    blockdev_close_all_bdrv_states();
>> +    blk_remove_all_bs();
>> +
>> +    /* Cancel all block jobs */
>> +    while (!QTAILQ_EMPTY(&all_bdrv_states)) {
>> +        QTAILQ_FOREACH(bs, &all_bdrv_states, bs_list) {
>> +            aio_context = bdrv_get_aio_context(bs);
>> +
>> +            aio_context_acquire(aio_context);
>> +            if (bs->job) {
>> +                /* So we can safely query the current refcount */
>> +                bdrv_ref(bs);
>> +                original_refcount = bs->refcnt;
>> +
>> +                block_job_cancel_sync(bs->job);
>> +                aio_context_release(aio_context);
>> +                break;
>> +            }
>> +            aio_context_release(aio_context);
>> +        }
>> +
>> +        /* All the remaining BlockDriverStates are referenced directly or
>> +         * indirectly from block jobs, so there needs to be at least one BDS
>> +         * directly used by a block job */
>> +        assert(bs);
>> +
>> +        /* Wait for the block job to release its reference */
>> +        while (bs->refcnt >= original_refcount) {
>> +            aio_poll(aio_context, true);
>> +        }
>> +        bdrv_unref(bs);
> 
> If at this point bs->refcnt is greater than 1, why don't we care where are the
> remaining references from?

We do care. A BDS will not be removed from all_bdrv_states until it is
deleted (i.e. its refcount becomes 0). Therefore, this loop will
continue until all BDSs have been deleted.

So where might additional references come from? Since this loop only
cares about direct or indirect references from block jobs, that's
exactly it:

(1) You might have multiple block jobs running on a BDS in the future.
    Then, you'll cancel them one by one, and after having canceled the
    first one, the refcount will still be greater than one before the
    bdrv_unref().

(2) Imagine a BDS A with a parent BDS B. There are block jobs running on
    both of them. Now, B is referenced by both its block job and by A
    (indirectly by the block job referencing A). If we cancel the job on
    B before the one on A, then the refcount on B will still be greater
    than 1 before bdrv_unref() because it is still referenced by its
    parent (until the block job on A is canceled, too).

The first cannot happen right now, but the second one may, I'm not sure
(depending on whether op blockers allow it).


And we do make sure that there are no additional references besides:
- directly from the monitor (monitor-owned BDSs)
- from a BB
- from block jobs
- (+ everything transitively through the respective BDS tree)

If there were additional references, the inner loop would at some point
no longer find a BDS with a block job while all_bdrv_states is still not
empty. That's what the "assert(bs)" after the inner loop is for.

If you can imagine another way where a reference to a BDS may come from,
that would be a bug in this patch and we have to make sure to respect
that case, too.

Max

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH v7 18/24] block: Make bdrv_close() static, (continued)
- [Qemu-devel] [PATCH v7 18/24] block: Make bdrv_close() static, Max Reitz, 2015/11/09
  - Re: [Qemu-devel] [PATCH v7 18/24] block: Make bdrv_close() static, Fam Zheng, 2015/11/12
- [Qemu-devel] [PATCH v7 19/24] block: Add list of all BlockDriverStates, Max Reitz, 2015/11/09
  - Re: [Qemu-devel] [PATCH v7 19/24] block: Add list of all BlockDriverStates, Fam Zheng, 2015/11/12
    - Re: [Qemu-devel] [PATCH v7 19/24] block: Add list of all BlockDriverStates, Max Reitz, 2015/11/16
- [Qemu-devel] [PATCH v7 20/24] blockdev: Keep track of monitor-owned BDS, Max Reitz, 2015/11/09
  - Re: [Qemu-devel] [PATCH v7 20/24] blockdev: Keep track of monitor-owned BDS, Max Reitz, 2015/11/09
- [Qemu-devel] [PATCH v7 21/24] block: Add blk_remove_all_bs(), Max Reitz, 2015/11/09
- [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close_all(), Max Reitz, 2015/11/09
  - Re: [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close_all(), Fam Zheng, 2015/11/12
    - Re: [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close_all(), Max Reitz <=
    - Re: [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close_all(), Fam Zheng, 2015/11/17
- [Qemu-devel] [PATCH v7 23/24] iotests: Add test for multiple BB on BDS tree, Max Reitz, 2015/11/09
- [Qemu-devel] [PATCH v7 24/24] iotests: Add test for block jobs and BDS ejection, Max Reitz, 2015/11/09
  - Re: [Qemu-devel] [PATCH v7 24/24] iotests: Add test for block jobs and BDS ejection, Kevin Wolf, 2015/11/30
    - Re: [Qemu-devel] [PATCH v7 24/24] iotests: Add test for block jobs and BDS ejection, Max Reitz, 2015/11/30
- Re: [Qemu-devel] [PATCH v7 for-2.6 00/24] block: Rework bdrv_close_all(), Kevin Wolf, 2015/11/25

Prev by Date: Re: [Qemu-devel] [PATCH v13 09/13] add check reset mechanism when hotplug vfio device
Next by Date: [Qemu-devel] [PATCH V2 3/3] target-i386: add pkeys support for vm migration
Previous by thread: Re: [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close_all()
Next by thread: Re: [Qemu-devel] [PATCH v7 22/24] block: Rewrite bdrv_close_all()
Index(es):
- Date
- Thread