qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH] target/ppc: only save guest timebase once after s


From: Michael Roth
Subject: Re: [Qemu-ppc] [PATCH] target/ppc: only save guest timebase once after stopping
Date: Fri, 04 May 2018 07:18:13 -0500
User-agent: alot/0.6

Quoting Greg Kurz (2018-05-04 04:37:24)
> On Thu,  3 May 2018 23:20:44 -0500
> Michael Roth <address@hidden> wrote:
> 
> > In some cases (e.g. spapr) we record guest timebase after qmp_stop()
> > via a runstate hook so we can restore it on qmp_cont(). If a migration
> > occurs in between those events we end up saving it again, this time
> > based on the current timebase the guest would be seeing had it been
> > running. This has the effect of advancing the guest timebase while
> > it is stopped, which is not what the code intends.
> > 
> 
> Hi Mike,
> 
> The current behavior was introduced by:
> 
> commit 42043e4f1241eeb77f87f5816b5cf0b6e9583ed7
> Author: Laurent Vivier <address@hidden>
> Date:   Fri Jan 27 13:24:58 2017 +0100
> 
>     spapr: clock should count only if vm is running
> 
> and we have this in the changelog:
> 
>     We keep timebase_pre_save to reduce the clock difference on
>     migration like in:
>         6053a86 kvmclock: reduce kvmclock difference on migration
> 
> 
> So your patch totally negates ^^ ? Also, I can't see a case where

Yah... this is a bit confusing. On one hand, the patch/summary is clearly
trying to avoid the guest time from advancing while it is stopped, which
is in the spirit of this patch. But at the same time it is trying to
compensate for loss of time (relative to host) due to downtime window.

I think the subtlety is in the amount of time... saving at pre_save
rather than vm_stop() compensates for the normal downtime window, which
is *usually* small (5s is the figure they quote in the notes there and
in the motivating 6053a86 "kvmclock: reduce kvmclock difference on
migration"). The delays between vm_stop and vm_cont via something like
virsh suspend/resume is unbounded, unhowever, hence the rationale for
the runstate hook (?).

So maybe small jumps are considered okay, and large ones not? If that's
the reasoning, then this patch is addressing the later, so it's not
necessarily in conflict with that motivation, but the implementation
does negate the small jumps we try to avoid via pre_save hook since
we'll end up keep the version we saved just after vm_stop instead.

I would note that the downtime window itself, while usually small, can
also be quite large. With 1GB hugepages we've seen some guests requiring
downtime windows to be set to 25s until QEMU would start cut-over. Also
rcu_cpu_stall_timeout is configurable...it's possible if we set it to
5s it could trigger on the jump the guest experiences from pre_save (I
haven't tested that though).

Maybe trying to compensate for downtime is a generally bad idea and we
should just leave it up to NTP/etc? Or maybe we should choose a specific
upper bound on how much migration downtime we're willing to compensate
for and enforce that directly? E.g. tb->saved becomes tb->saved_time and
we check the difference in pre_save before calling timebase_save()
again.

> So your patch totally negates ^^ ? Also, I can't see a case where
> timebase_save() could be called from vmstate_save_state() while the
> VM is running, ie, you could drop timebase_pre_save()... or am I
> *probably* missing something ?

Yah, I didn't notice that my patch completely negated the pre_save
hook... for some reason I was thinking that would continue to function
normally if we didn't call qmp_stop() explicitly but that's clearly not
the case. So yah, dropping timebase_pre_save() is essentially what my
patch is doing...

> 
> > Other than simple jumps in time, this has been seen to trigger what
> > appear to be RCU-related crashes in recent kernels when the advance
> > exceeds rcu_cpu_stall_timeout, and it can be triggered by fairly
> > common operations such as `virsh migrate ... --timeout 60`.
> > 
> > Cc: Alexey Kardashevskiy <address@hidden>
> > Cc: David Gibson <address@hidden>
> > Cc: Laurent Vivier <address@hidden>
> > Cc: address@hidden
> > Cc: address@hidden
> > Signed-off-by: Michael Roth <address@hidden>
> > ---
> >  hw/ppc/ppc.c         | 12 ++++++++++++
> >  target/ppc/cpu-qom.h |  1 +
> >  2 files changed, 13 insertions(+)
> > 
> > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> > index ec4be25f49..ff0a107864 100644
> > --- a/hw/ppc/ppc.c
> > +++ b/hw/ppc/ppc.c
> > @@ -865,6 +865,15 @@ static void timebase_save(PPCTimebase *tb)
> >      uint64_t ticks = cpu_get_host_ticks();
> >      PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
> >  
> > +    /* since we generally save timebase just after the guest
> > +     * has stopped, avoid trying to save it again since we will
> > +     * end up advancing it by the amount of ticks that have
> > +     * elapsed in the host since the initial save
> > +     */
> > +    if (tb->saved) {
> > +        return;
> > +    }
> > +
> >      if (!first_ppc_cpu->env.tb_env) {
> >          error_report("No timebase object");
> >          return;
> > @@ -877,6 +886,7 @@ static void timebase_save(PPCTimebase *tb)
> >       * there is no need to update it from KVM here
> >       */
> >      tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
> > +    tb->saved = true;
> >  }
> >  
> >  static void timebase_load(PPCTimebase *tb)
> > @@ -908,6 +918,8 @@ static void timebase_load(PPCTimebase *tb)
> >                          &pcpu->env.tb_env->tb_offset);
> >  #endif
> >      }
> > +
> > +    tb->saved = false;
> >  }
> >  
> >  void cpu_ppc_clock_vm_state_change(void *opaque, int running,
> > diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> > index deaa46a14b..ec2dbcdcae 100644
> > --- a/target/ppc/cpu-qom.h
> > +++ b/target/ppc/cpu-qom.h
> > @@ -210,6 +210,7 @@ typedef struct PowerPCCPUClass {
> >  typedef struct PPCTimebase {
> >      uint64_t guest_timebase;
> >      int64_t time_of_the_day_ns;
> > +    bool saved;
> >  } PPCTimebase;
> >  
> >  extern const struct VMStateDescription vmstate_ppc_timebase;
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]