qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/6] dirty-bitmaps: clean-up bitmaps loading and


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH 4/6] dirty-bitmaps: clean-up bitmaps loading and migration logic
Date: Thu, 2 Aug 2018 13:23:32 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

02.08.2018 01:28, John Snow wrote:

On 08/01/2018 04:47 PM, Denis V. Lunev wrote:
On 08/01/2018 09:56 PM, John Snow wrote:
On 08/01/2018 02:42 PM, Denis V. Lunev wrote:
On 08/01/2018 08:40 PM, Dr. David Alan Gilbert wrote:
* John Snow (address@hidden) wrote:
On 08/01/2018 06:20 AM, Dr. David Alan Gilbert wrote:
* John Snow (address@hidden) wrote:

<snip>

I'd rather do something like this:
- Always flush bitmaps to disk on inactivate.
Does that increase the time taken by the inactivate measurably?
If it's small relative to everything else that's fine; it's just I
always worry a little since I think this happens after we've stopped the
CPU on the source, so is part of the 'downtime'.

Dave
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

I'm worried that if we don't, we're leaving behind unusable, partially
complete files behind us. That's a bad design and we shouldn't push for
it just because it's theoretically faster.
Oh I don't care about theoretical speed; but if it's actually unusably
slow in practice then it needs fixing.

Dave
This is not "theoretical" speed. This is real practical speed and
instability.
It's theoretical until I see performance testing numbers; do you have
any? How much faster does the pivot happen by avoiding making the qcow2
consistent on close?

I don't argue that it's faster to just simply not write data, but what's
not obvious is how much time it actually saves in practice and if that's
worth doing unintuitive and undocumented things like "the source file
loses bitmaps after a migration because it was faster."
Also, frankly speaking, I do not understand the goal of this purism.

The goal of my series originally was just to limit some corner cases. At
the time it was not evident that avoiding a flush was a *goal* of that
series rather than a *side-effect* or a means to an end (avoiding
migrating a bitmap over two different channels).

It's not a goal and its not a side-effect. Flush is already avoided currently, in upstream code. And your series is degradation. The goal is another: make the whole thing more obvious and make it possible to restore bitmaps persistance after failed migration.


It was not immediately obvious to me that intentionally leaving behind
partially flushed qcow2 files was expected behavior. I still think it's
probably not the best behavior in general, but it's also not really
catastrophic either. If you had benchmarks it'd be useful to show an
obvious benefit to doing something unconventional.

In this case, I *do* consider not writing metadata back out to disk on
close something "unconventional."

Clearly my series missed missed an important case, so it can't be used
at all, and the status quo is also broken for several cases and also
cannot be used. With your performance concerns in mind, I'm looking at
Vladimir's series again. It might just require some more concise
comments explaining why you're taking the exact approach that you are.

--js

There 2 main cases - shared and non-shared storage. On shared
storage:
- normally migration is finished successfully. Source is shut down,
   target is started. The data in the file on shared storage would be
   __IMMEDIATELY__ marked as stale on target, i.e. you will save CBT
  on source (with IO over networked fs), load CBT on target (with IO
  over networked FS), mark CBT as stale (IO again). CBT data written
  is marked as lost
- failed migration. OK, we have CBT data written on source, CBT
   data read on source, CBT data marked stale. Thus any CBT on
   disk while VM is running is pure overhead.

The same situation is when we use non-shared migration. In this
case the situation is even worse. You save CBT and put it to trash
upon migration complete.

Please also note, that CBT saving almost does not protect us
from powerlosses as the power should be lost at the very
specific moment to allow data to survive and most likely
we will have to drop CBT completely.

Den



--
Best regards,
Vladimir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]