qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v9 13/13] block/backup: use backup-top instead o


From: Max Reitz
Subject: Re: [Qemu-devel] [PATCH v9 13/13] block/backup: use backup-top instead of write notifiers
Date: Mon, 2 Sep 2019 18:34:15 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 29.08.19 16:55, Vladimir Sementsov-Ogievskiy wrote:
> 28.08.2019 22:50, Max Reitz wrote:
>> On 26.08.19 18:13, Vladimir Sementsov-Ogievskiy wrote:
>>> Drop write notifiers and use filter node instead.
>>>
>>> = Changes =
>>>
>>> 1. add filter-node-name argument for backup qmp api. We have to do it
>>> in this commit, as 257 needs to be fixed.
>>
>> I feel a bit bad about it not being an implicit node.  But I know you
>> don’t like them, they’re probably just broken altogether and because
>> libvirt doesn’t use backup (yet), probably nobody cares.
>>
>>> 2. there no move write notifiers here, so is_write_notifier parameter
>>
>> s/there/there are/, I suppose?
>>
>>> is dropped from block-copy paths.
>>>
>>> 3. Intersecting requests handling changed, now synchronization between
>>> backup-top, backup and guest writes are all done in block/block-copy.c
>>> and works as follows:
>>>
>>> On copy operation, we work only with dirty areas. If bits are dirty it
>>> means that there are no requests intersecting with this area. We clear
>>> dirty bits and take bdrv range lock (bdrv_co_try_lock) on this area to
>>> prevent further operations from interaction with guest (only with
>>> guest, as neither backup nor backup-top will touch non-dirty area). If
>>> copy-operation failed we set dirty bits back together with releasing
>>> the lock.
>>>
>>> The actual difference with old scheme is that on guest writes we
>>> don't lock the whole region but only dirty-parts, and to be more
>>> precise: only dirty-part we are currently operate on. In old scheme
>>> guest write to non-dirty area (which may be safely ignored by backup)
>>> may wait for intersecting request, touching some other area which is
>>> dirty.
>>>
>>> 4. To sync with in-flight requests at job finish we now have drained
>>> removing of the filter, we don't need rw-lock.
>>>
>>> = Notes =
>>>
>>> Note the consequence of three objects appearing: backup-top, backup job
>>> and block-copy-state:
>>>
>>> 1. We want to insert backup-top before job creation, to behave similar
>>> with mirror and commit, where job is started upon filter.
>>>
>>> 2. We also have to create block-copy-state after filter injection, as
>>> we don't want it's source child be replaced by fitler. Instead we want
>>
>> s/it's/its/, s/filter/filter/ (although “fitler” does have an amusing
>> ring to it)
>>
>>> to keep BCS.source to be real source node, as we want to use
>>> bdrv_co_try_lock in CBW operations and it can't be used on filter, as
>>> on filter we already have in-flight (write) request from upper layer.
>>
>> Reasonable, even more so as I suppose BCS.source should eventually be a
>> BdrvChild of backup-top.
>>
>> What looks wrong is that the sync_bitmap is created on the source (“bs”
>> in backup_job_create()), but backup_cleanup_sync_bitmap() still assumes
>> it is on blk_bs(job->common.blk) (which is no longer true).
>>
>>> So, we firstly create inject backup-top, then create job and BCS. BCS
>>> is the latest just to not create extra variable for it. Finally we set
>>> bcs for backup-top filter.
>>>
>>> = Iotest changes =
>>>
>>> 56: op-blocker doesn't shot now, as we set it on source, but then check
>>
>> s/shot/show/?
>>
>>> on filter, when trying to start second backup, so error caught in
>>> test_dismiss_collision is changed. It's OK anyway, as this test-case
>>> seems to test that after some collision we can dismiss first job and
>>> successfully start the second one. So, the change is that collision is
>>> changed from op-blocker to file-posix locks.
>>
>> Well, but the op blocker belongs to the source, which should be covered
>> by internal permissions.  The fact that it now shows up as a file-posix
>> error thus shows that the conflict also moves from the source to the
>> target.  It’s OK because we actually don’t have a conflict on the source.
>>
>> But I wonder whether it would be better for test_dismiss_collision() to
>> do a blockdev-backup instead where we can see the collision on the target.
>>
>> Hm.  On second thought, why do we even get a conflict on the target?
>> block-copy does share the WRITE permission for it...
> 
> Not sure, but assume that this is because in file-posix.c in raw_co_create
> we do want RESIZE perm.
> 
> I can instead move this test to use specified job-id, to move the collision
> to "job-id already exists" error. Is it better?

It’s probably the only thing that actually makes sense.

> I'm afraid that posix locks will not work if disable them in config.

Yes (or if the environment doesn’t support them); which is why I
suggested using blockdev-backup.  (But that would have the same problem
of “Where does the permission conflict even come from and is it
inevitable?”)

[...]

>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>> index 6828c46ba0..f3102ec3ff 100644
>>> --- a/block/block-copy.c
>>> +++ b/block/block-copy.c
>>> @@ -98,27 +98,32 @@ fail:
>>>    * error occurred, return a negative error number
>>>    */
>>>   static int coroutine_fn block_copy_with_bounce_buffer(
>>> -        BlockCopyState *s, int64_t start, int64_t end, bool 
>>> is_write_notifier,
>>> +        BlockCopyState *s, int64_t start, int64_t end,
>>>           bool *error_is_read, void **bounce_buffer)
>>>   {
>>>       int ret;
>>> -    int nbytes;
>>> -    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>>> +    int nbytes = MIN(s->cluster_size, s->len - start);
>>> +    void *lock = bdrv_co_try_lock(blk_bs(s->source), start, nbytes);
>>> +
>>> +    /*
>>> +     * Function must be called only on full-dirty region, so nobody 
>>> touching or
>>> +     * touched these bytes. Therefore, we must successfully get lock.
>>
>> Which is OK because backup-top does not share the WRITE permission.  But
>> it is organized a bit weirdly, because it’s actually block-copy here
>> that relies on the WRITE permission not to be shared; which it itself
>> cannot do because backup-top needs it.
>>
>> This of course results from the fact that backup-top and block-copy
>> should really use the same object to access the source, namely the
>> backing BdrvChild of backup-top.
>>
>> Maybe there should be a comment somewhere that points this out?
> 
> But I wanted block-copy not to be directly connected to backup-top.

But it doesn’t really work.  There’s an obscure contract that already
binds block-copy to backup-top.

> I think actually we rely on the fact that user of block-copy() guarantees that
> nobody is touching dirty areas (except block_copy). And with backup, this is 
> done by
> backup-top filter. I'll add a comment on this.

Yes, that’s the contract.  And it’s weird, because block-copy has its
own BB, so if it doesn’t want anyone to write to the source outside of
its control, it should just unshare the WRITE permission – but it cannot
do that, because its user needs WRITE access, and so it implicitly
relies on that user to then unshare WRITE.

If both used the same single BdrvChild, it would be clear

(1) why the BdrvChild belongs to block-copy’s user (because block-copy’s
user would need to own a node that then has this BdrvChild),

(2) why block-copy does not unshare the WRITE permission (it cannot,
because it doesn’t create the BdrvChild),

(3) how block-copy could explicitly ensure that WRITE is unshared (by
asserting !(child->shared_perm & WRITE)).


I’m not saying that we need to abandon having BBs right now, but I think
there are a couple of cases which show why I say it’s uglier than using
BdrvChildren instead.

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]