Re: [Qemu-devel] Migration auto-converge problem

From: John Snow
Subject: Re: [Qemu-devel] Migration auto-converge problem
Date: Wed, 11 Mar 2015 19:23:44 -0400
On 03/02/2015 04:04 PM, Jason J. Herne wrote:
We have a test case that dirties memory very very quickly. When we run
this test case in a guest and attempt a migration, that migration never
converges even when done with auto-converge on.

The auto converge behavior of Qemu functions differently purpose than I
had expected. In my mind, I expected auto converge to continuously apply
adaptive throttling of the cpu utilization of a busy guest if Qemu
detects that progress is not being made quickly enough in the guest
memory transfer. The idea is that a guest dirtying pages too quickly
will be adaptively slowed down by the throttling until migration is able
to transfer pages fast enough to complete the migration within the max
downtime. Qemu's current auto converge does not appear to do this in

A quick look at the source code shows the following:
- Autoconverge keeps a counter. This counter is only incremented if, for
a completed memory pass, the guest is dirtying pages at a rate of 50%
(or more) of our transfer rate.
- The counter only increments at most once per pass through memory.
- The counter must reach 4 before any throttling is done. (a minimum of
4 memory passes have to occur)
- Once the counter reaches 4, it is immediately reset to 0, and then
throttling action is taken.
- Throttling occurs by doing an async sleep on each guest cpu for 30ms,
exactly one time.

Now consider the scenario auto-converge is meant to solve (I think): A
guest touching lots of memory very quickly. Each pass through memory is
going to be sending a lot of pages, and thus, taking a decent amount of
time to complete. If, for every four passes, we are *only* sleeping the
guest for 30ms, our guest is still going to be able dirty pages faster
than we can transfer them. We will never catch up because the sleep time
relative to guest execution time is very very small.

Auto converge, as it is implemented today, does not address the problem
I expect it solve. However, after rapid prototyping a new version of
auto converge that performs adaptive modeling I've learned something.
The workload I'm attempting to migrate is actually a pathological case.
It is an excellent example of why throttling cpu is not always a good
method of limiting memory access. In this test case we are able to touch
over 600 MB of pages in 50 ms of continuous execution. In this case,
even if I throttle the guest to 5% (50ms runtime, 950ms sleep) we still
cannot even come close to catching up even with a fairly speedy network
link (which not every user will have).

Given the above, I believe that some workloads touch memory too fast and
we'll never be able to live migrate them with auto-converge. On the
lower end there are workloads that have a very small/stagnant working
set size which will be live migratable without the need for
auto-converge. Lastly, we have "the nebulous middle". These are
workloads that would benefit from auto-converge because they touch pages
too fast for migration to be able to deal with them, AND (important
conditional here), throttling will(may?) actually reduce their rate of
page modifications. I would like to try and define this "middle" set of

A question with no obvious answer: How much throttling is acceptable? If
I have to throttle a guest 90% and he ends up failing 75% of whatever
transactions he is attempting to process then we have quite likely
defeated the entire purpose of "live" migration. Perhaps it would be
better in this case to just stop the guest and do a non-live migration.
Maybe by reverting to non-live we actually save time and thus more
transactions would have completed. This one may take some experimenting
to be able to get a good idea for what makes the most sense. Maybe even
have max throttling be be user configurable.

With all this said, I still wonder exactly how big this "nebulous
middle" really is. If, in practice, that "middle" only accounts for 1%
of the workloads out there then is it really worth spending time fixing
it? Keep in mind this is a two pronged test:
1. Guest cannot migrate because it changes memory too fast
2. Cpu throttling slows guest's memory writes down enough such that he
can now migrate

I'm interested in any thoughts anyone has. Thanks!

This is just a passing thought since I have not invested deeply in the live migration convergence mechanisms myself, but:

Is it possible to apply a progressively more brutish throttle to a guest if we detect we are not making (or indeed /losing/) progress?

We could start with no throttle and see how far we get, then progressively apply a tighter grip on the VM until we make satisfactory progress, then continue on until we hit our "Just pause it and ship the rest" threshold.

That way we allow ourselves the ability to throttle very naughty guests very aggressively (To the point of effectively even paused) without disturbing the niceness of our largely idle guests. In this way, even very high throttle caps should be acceptable.

This will allow live migration to "fail gracefully" for cases that are modifying memory or disk just too absurdly fast back to essentially a paused migration.

I'll leave it to the migration wizards to explain why I am foolhardy.

