[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] Problem with "savevm" on ppc64
From: |
David Gibson |
Subject: |
Re: [Qemu-ppc] Problem with "savevm" on ppc64 |
Date: |
Fri, 21 Oct 2016 16:26:27 +1100 |
User-agent: |
Mutt/1.7.0 (2016-08-17) |
On Thu, Oct 20, 2016 at 03:17:12PM +0200, Thomas Huth wrote:
> Hi all,
>
> I'm currently facing a strange problem with the "savevm" HMP command on
> ppc64 with TCG and the pseries machine. Steps for reproduction:
>
> 1) Create a disk image:
> qemu-img create -f qcow2 /tmp/test.qcow2 1M
>
> 2) Start QEMU (in TCG mode):
> qemu-system-ppc64 -nographic -vga none -m 256 -hda /tmp/test.qcow2
>
> 3) Hit "CTRL-a c" to enter the HMP monitor
>
> 4) Type "savevm test1" to save a snapshot
>
> The savevm command then hangs forever and the test.qcow2 image keeps
> growing and growing.
>
> It seems like qemu_savevm_state_iterate() does not make any more
> progress because ram_save_iterate() keeps returning 0 ... but why can
> that happen?
Hmm. You don't mention installing anything on the disk image, so I'm
assuming the VM is just sitting in SLOF, unable to boot. And it's
TCG, so there's no VRMA. Which mans I'd expect the HPT to be
completely empty.
I strongly suspect that's what's triggering the problem, although I'm
not quite sure of the mechanism
> I've tinkered with the code a little bit, and I can get it to work with
> the following patch - which however is quite independent of the
> ram_save_iterate() code:
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ddb7438..a7ac0bf 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1290,7 +1290,7 @@ static int htab_save_setup(QEMUFile *f, void *opaque)
> return 0;
> }
>
> -static void htab_save_first_pass(QEMUFile *f, sPAPRMachineState *spapr,
> +static int htab_save_first_pass(QEMUFile *f, sPAPRMachineState *spapr,
> int64_t max_ns)
> {
> bool has_timeout = max_ns != -1;
> @@ -1340,6 +1340,8 @@ static void htab_save_first_pass(QEMUFile *f,
> sPAPRMachineState *spapr,
> spapr->htab_first_pass = false;
> }
> spapr->htab_save_index = index;
> +
> + return !spapr->htab_first_pass;
> }
>
> static int htab_save_later_pass(QEMUFile *f, sPAPRMachineState *spapr,
> @@ -1444,7 +1446,7 @@ static int htab_save_iterate(QEMUFile *f, void
> *opaque)
> return rc;
> }
> } else if (spapr->htab_first_pass) {
> - htab_save_first_pass(f, spapr, MAX_ITERATION_NS);
> + rc = htab_save_first_pass(f, spapr, MAX_ITERATION_NS);
> } else {
> rc = htab_save_later_pass(f, spapr, MAX_ITERATION_NS);
> }
>
> That means, if htab_save_iterate() does not initially return a 0, then
> ram_save_iterate() also does not run into the condition of returning
> zeroes later. But how are these related? And is it safe for returning a
> non-zero value in htab_save_first_pass()? Does anybody with more
> experience in this area has a clue what's going on here?
So, looking at this I think it's unsafe. htab_save_first_pass() never
examines dirty bits, so we could get:
htab_save_first_pass() called once, saves part of HPT
guest dirties an HPTE in the already saved region
enter migration completion stage
htab_save_first_pass() saves the remainder of the HPT, returns 1
That would trigger the code to think the HPT migration is complete,
without ever saving the HPTE that got dirtied part way through.
But.. then I looked further and got *really* confused.
The comment above qemu_savevm_state_iterate() and the logic in
qemu_savevm_state() say that the iterate function returns >0 to
indicate that it is done and we can move onto the completion phase.
But both ram_save_iterate() and block_save_iterate() seem to have that
inverted: they return >0 if they actually saved something.
I think the only reason migration has been working *at all* is that
migration_thread() and qemu_savevm_state_complete_precopy() ignore the
return value of the iterate() function (except for <0 errors).
qemu_savevm_state(), however, does not.
So, I don't think this is actually a Power bug. But, it's masked on
machine types other than pseries, because they will almost never have
>1 iterable state handler. I think what must be happening is that
because we don't have time limits in the savevm case, the first call
to ram_save_iterate() from qemu_savevm_state_iterate() is saving all
of RAM. On x86 that would be the end of the QTAILQ_FOREACH(), we'd
return non-zero and that would kick qemu_savevm_state() into the
completion phase.
But on Power, we save all RAM, then move onto the HPT iterator. It
returns 0 on the first pass, and we loop again in
qemu_savevm_state(). But on the next call, RAM is all saved, so
ram_save_iterate now returns 0. That breaks us out of the FOREACH in
qemu_savevm_state_iterate(), returning 0. Repeat forever.
Juan, Dave, sound right?
I suspect the right fix is to make ram_save_iterate() return void.
Migration will use the bandwidth / downtime calculations to determine
completion as always. savevm would skip the "live" phase entirely and
move directly on to completion phase unconditionally.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature