qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 07/12] block/backup: add 'always' bitmap sync po


From: John Snow
Subject: Re: [Qemu-devel] [PATCH 07/12] block/backup: add 'always' bitmap sync policy
Date: Fri, 21 Jun 2019 16:58:01 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0


On 6/21/19 9:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> 21.06.2019 16:08, Vladimir Sementsov-Ogievskiy wrote:
>> 21.06.2019 15:59, Vladimir Sementsov-Ogievskiy wrote:
>>> 21.06.2019 15:57, Vladimir Sementsov-Ogievskiy wrote:

^ Home Run!

I'm going to reply to all four of these mails at once below, I'm sorry
for the words but I want to make sure I am being clear in my intent.

>>>> 20.06.2019 4:03, John Snow wrote:
>>>>> This adds an "always" policy for bitmap synchronization. Regardless of if
>>>>> the job succeeds or fails, the bitmap is *always* synchronized. This means
>>>>> that for backups that fail part-way through, the bitmap retains a record 
>>>>> of
>>>>> which sectors need to be copied out to accomplish a new backup using the
>>>>> old, partial result.
>>>>>
>>>>> In effect, this allows us to "resume" a failed backup; however the new 
>>>>> backup
>>>>> will be from the new point in time, so it isn't a "resume" as much as it 
>>>>> is
>>>>> an "incremental retry." This can be useful in the case of extremely large
>>>>> backups that fail considerably through the operation and we'd like to not 
>>>>> waste
>>>>> the work that was already performed.
>>>>>
>>>>> Signed-off-by: John Snow <address@hidden>
>>>>> ---
>>>>>   qapi/block-core.json |  5 ++++-
>>>>>   block/backup.c       | 10 ++++++----
>>>>>   2 files changed, 10 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>> index 0332dcaabc..58d267f1f5 100644
>>>>> --- a/qapi/block-core.json
>>>>> +++ b/qapi/block-core.json
>>>>> @@ -1143,6 +1143,9 @@
>>>>>   # An enumeration of possible behaviors for the synchronization of a 
>>>>> bitmap
>>>>>   # when used for data copy operations.
>>>>>   #
>>>>> +# @always: The bitmap is always synchronized with remaining blocks to 
>>>>> copy,
>>>>> +#          whether or not the operation has completed successfully or 
>>>>> not.
>>>>
>>>> Hmm, now I think that 'always' sounds a bit like 'really always' i.e. 
>>>> during backup
>>>> too, which is confusing.. But I don't have better suggestion.
>>>>

I could probably clarify to say "at the conclusion of the operation",
but we should also keep in mind that bitmaps tied to an operation can't
be used during that timeframe anyway.

>>>>> +#
>>>>>   # @conditional: The bitmap is only synchronized when the operation is 
>>>>> successul.
>>>>>   #               This is useful for Incremental semantics.
>>>>>   #
>>>>> @@ -1153,7 +1156,7 @@
>>>>>   # Since: 4.1
>>>>>   ##
>>>>>   { 'enum': 'BitmapSyncMode',
>>>>> -  'data': ['conditional', 'never'] }
>>>>> +  'data': ['always', 'conditional', 'never'] }
>>>>>   ##
>>>>>   # @MirrorCopyMode:
>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>> index 627f724b68..beb2078696 100644
>>>>> --- a/block/backup.c
>>>>> +++ b/block/backup.c
>>>>> @@ -266,15 +266,17 @@ static void 
>>>>> backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
>>>>>       BlockDriverState *bs = blk_bs(job->common.blk);
>>>>>       if (ret < 0 || job->bitmap_mode == BITMAP_SYNC_MODE_NEVER) {
>>>>> -        /* Failure, or we don't want to synchronize the bitmap.
>>>>> -         * Merge the successor back into the parent, delete nothing. */
>>>>> +        /* Failure, or we don't want to synchronize the bitmap. */
>>>>> +        if (job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
>>>>> +            bdrv_dirty_bitmap_claim(job->sync_bitmap, &job->copy_bitmap);
>>>>> +        }
>>>>> +        /* Merge the successor back into the parent. */
>>>>>           bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
>>>>
>>>> Hmm good, it should work. It's a lot more tricky, than just
>>>> "synchronized with remaining blocks to copy", but I'm not sure the we need 
>>>> more details in
>>>> spec.
>>>>

Right, it's complicated because backups involve two points in time; the
start and finish of the operation. The actual technical truth of what
happens is hard to phrase succinctly.

It was difficult to phrase for even the normal Incremental/conditional
mode that we have.

I can't help but feel I need to write a blog post that has some good
diagrams that can be used to explain the concept clearly.

>>>> What we have in backup? So, from one hand we have an incremental backup, 
>>>> and a bitmap, counting from it.
>>>> On the other hand it's not normal incremental backup, as it don't 
>>>> correspond to any valid state of vm disk,
>>>> and it may be used only as a backing in a chain of further successful 
>>>> incremental backup, yes?
>>>>

You can also continue writing directly into it, which is likely the
smarter choice because it saves you the trouble of doing an intermediate
block commit later, and then you don't keep any image files that are
"meaningless" by themselves.

However, yes, iotest 257 uses them as backing images.

>>>> And then I think: with this mode we can not stop on first error, but 
>>>> ignore it, just leaving dirty bit for
>>>> resulting bitmap.. We have BLOCKDEV_ON_ERROR_IGNORE, which may be used to 
>>>> achieve it, but seems it don't
>>>> work as expected, as in backup_loop() we retry operation if ret < 0 and  
>>>> action != BLOCK_ERROR_ACTION_REPORT.
>>>>

This strikes me as a good idea, but I wonder: if we retry already for
'ignore', it seems likely that transient network errors likely recover
on their own as a result already. Are there cases where we really want
the job to move forward, because we expect certain sectors will never
copy correctly, like reading from unreliable media? Are those cases ones
we expect to be able to fix later?

(Actually, what happens if we ignore errors and we get stuck on a
sector? How many times do we retry this before we give up and admit that
it's actually an error we can't ignore?)

The use cases aren't clear to me right away, but it's worth looking into
because it sounds like it could be useful. I think that should not be
part of this series, however.

>>>> And another thought: can user take a decision of discarding (like 
>>>> CONDITIONAL) or saving in backing chain (like
>>>> ALWAYS) failed backup result _after_ backup job complete? For example, for 
>>>> small resulting backup it may be
>>>> better to discard it and for large - to save.

That seems complicated, because you need to keep the bitmap split into
its component subsets (original, copy manifest, and writes since start)
all the way until AFTER the job, which means more bitmap management
commands that need to be issued after the job is done.

Which means the job would move the bitmap into yet another new state
where it is "busy" and cannot be used, but is awaiting some kind of
rollover command from the user.

However, you could also just use our 'merge' support to make a copy of
the bitmap before you begin and use the 'always' sync mode, then if you
decide it's not worth restarting after the fact, you can just delete the
copy.

>>>> Will it work if we start job with ALWAYS mode and autocomplete = false, 
>>>> then on fail we can look at job progress,
>>>> and if it is small we cancel job, otherwise call complete? Or stop, 
>>>> block-job-complete will not work with failure
>>>> scenarios? Then we have to set BLOCKDEV_ON_ERROR_IGNORE, and on first 
>>>> error event decide, cancel or not? But we
>>>> can only cancel or continue..
>>>>
>>>> Hmm. Cancel. So on cancel and abort you synchronize bitmap too? Seems in 
>>>> bad relation with what cancel should do,
>>>> and in transactions in general...
>>>
>>> I mean grouped transaction mode, how should it work with this?
>>
>> Actual the problem is that you want to implement partial success, and block 
>> jobs api and transactions api are not prepared
>> for such thing

I wouldn't call it partial success, but rather a "failure with detailed
error log" -- but I concede I am playing games with terminology.

The operation failed, but the bitmap can be considered a record of
exactly which bitmap regions didn't succeed in being copied.

You're right, though; the regions that got cleared could be considered a
record of partial success; but I think I might resist the idea of
wanting to formalize that in a new API. I think it's easier to
conceptualize it as a recoverable failure, and the bitmap behaves as the
resume/recovery data.

> 
> 
> Should it be OK if we just:
> 
> 1. restrict using ALWAYS together with grouped transaction mode, so we don't 
> need to deal with other job failures.
> 2. don't claim but only reclaim on cancel even in ALWAYS mode, to make cancel 
> roll-back all things
> 
> ?
> 
> 

> "grouped transaction mode, how should it work with this?"

With or without the grouped completion mode, it does the same thing: it
ALWAYS synchronizes!

Yes, that means that:

1. In the case of user cancellation, it still saves partial work.
2. In the case of an induced cancellation from a peer job, it saves
partial work.

I think this behavior is correct because grouped completion mode does
not actually guarantee that jobs that are already running clean up
completely as if they were never launched; that is, we cannot undo the
fact that we DID copy data out to a target. Therefore, because we
factually DID perform some work, this mode simply preserves a record of
what DID occur, in the case that the client prefers to salvage partial
work instead of restarting from scratch.

Just because we launched this as part of a transaction, in other words,
does not seem like a good case for negating the intention of the user to
be able to resume from failures if they occur.

I realize this seems like a strange usage of a transaction -- because
state from the transaction can escape a failed transaction -- but real
database systems in practice do allow the ability to do partial unwinds
to minimize the amount of work that needs to be redone. I don't think
this is too surprising -- and it happens ONLY when the user specifically
requests it, so I believe this is safe behavior.

I do agree that it *probably* doesn't make sense to use these together
-- it will work just fine, but why would you be okay with one type of
partial completion when you're not ok with another? I don't understand
why you'd ask for it, but I think it will do exactly what you ask for.
It's actually less code to just let it work like this.


> "So on cancel and abort you synchronize bitmap too?"

I will concede that this means that if you ask for a bitmap backup with
the 'always' policy and, for whatever reason change your mind about
this, there's no way to "cancel" the job in a manner that does not edit
the bitmap at this point.

I do agree that this seems to go against the wishes of the user, because
we have different "kinds" of cancellations:

A) Cancellations that actually represent failures in transactions
B) Cancellations that represent genuine user intention

It might be nice to allow the user to say "No wait, please don't edit
that bitmap, I made a mistake!"

In these cases, how about a slight overloading of the "forced" user
cancellation? For mirror, a "forced" cancel means "Please don't try to
sync the mirror, just exit immediately." whereas an unforced cancel
means "Please complete!"

For backup, it could mean something similar:

force: "Please exit immediately, don't even sync the bitmap!"
!force: "Exit, but proceed with normal cleanup."

This would mean that in the grouped completion failure case, we actually
still sync the bitmap, but if the user sends a forced-cancel, they can
actually abort the entire transaction.

Actually, that needs just a minor edit:

job.c:746:

job_cancel_async(other_job, false);

should become:

job_cancel_async(other_job, job->force_cancel)


And then the bitmap cleanup code can check for this condition to avoid
synchronizing the bitmap in these cases.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]