[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming fro
From: |
Christoffer Dall |
Subject: |
Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop |
Date: |
Tue, 12 Mar 2019 13:25:28 +0100 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
[Adding Steven Price, who has recently looked at this, in cc]
On Tue, Mar 12, 2019 at 10:08:47AM +0000, Peter Maydell wrote:
> On Tue, 12 Mar 2019 at 06:10, Heyi Guo <address@hidden> wrote:
> >
> > When we stop a VM for more than 30 seconds and then resume it, by qemu
> > monitor command "stop" and "cont", Linux on VM will complain of "soft
> > lockup - CPU#x stuck for xxs!" as below:
> >
> > [ 2783.809517] watchdog: BUG: soft lockup - CPU#3 stuck for 2395s!
> > [ 2783.809559] watchdog: BUG: soft lockup - CPU#2 stuck for 2395s!
> > [ 2783.809561] watchdog: BUG: soft lockup - CPU#1 stuck for 2395s!
> > [ 2783.809563] Modules linked in...
> >
> > This is because Guest Linux uses generic timer virtual counter as
> > a software watchdog, and CNTVCT_EL0 does not stop when VM is stopped
> > by qemu.
> >
> > This patch is to fix this issue by saving the value of CNTVCT_EL0 when
> > stopping and restoring it when resuming.
>
> Hi -- I know we have issues with the passage of time in Arm VMs
> running under KVM when the VM is suspended, but the topic is
> a tricky one, and it's not clear to me that this is the correct
> way to fix it. I would prefer to see us start with a discussion
> on the kvm-arm mailing list about the best approach to the problem.
>
> I've cc'd that list and a couple of the Arm KVM maintainers
> for their opinion.
>
> QEMU patch left below for context -- the brief summary is that
> it uses KVM_GET_ONE_REG/KVM_SET_ONE_REG on the timer CNT register
> to save it on VM pause and write that value back on VM resume.
>
> thanks
> -- PMM
>
>
>
> > Cc: Peter Maydell <address@hidden>
> > Signed-off-by: Heyi Guo <address@hidden>
> > ---
> > target/arm/cpu.c | 65
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 65 insertions(+)
> >
> > diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> > index 96f0ff0..7bbba3d 100644
> > --- a/target/arm/cpu.c
> > +++ b/target/arm/cpu.c
> > @@ -896,6 +896,60 @@ static void arm_cpu_finalizefn(Object *obj)
> > #endif
> > }
> >
> > +static int get_vcpu_timer_tick(CPUState *cs, uint64_t *tick_at_pause)
> > +{
> > + int err;
> > + struct kvm_one_reg reg;
> > +
> > + reg.id = KVM_REG_ARM_TIMER_CNT;
> > + reg.addr = (uintptr_t) tick_at_pause;
> > +
> > + err = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
> > + return err;
> > +}
> > +
> > +static int set_vcpu_timer_tick(CPUState *cs, uint64_t tick_at_pause)
> > +{
> > + int err;
> > + struct kvm_one_reg reg;
> > +
> > + reg.id = KVM_REG_ARM_TIMER_CNT;
> > + reg.addr = (uintptr_t) &tick_at_pause;
> > +
> > + err = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
> > + return err;
> > +}
> > +
> > +static void arch_timer_change_state_handler(void *opaque, int running,
> > + RunState state)
> > +{
> > + static uint64_t hw_ticks_at_paused;
> > + static RunState pre_state = RUN_STATE__MAX;
> > + int err;
> > + CPUState *cs = (CPUState *)opaque;
> > +
> > + switch (state) {
> > + case RUN_STATE_PAUSED:
> > + err = get_vcpu_timer_tick(cs, &hw_ticks_at_paused);
> > + if (err) {
> > + error_report("Get vcpu timer tick failed: %d", err);
> > + }
> > + break;
> > + case RUN_STATE_RUNNING:
> > + if (pre_state == RUN_STATE_PAUSED) {
> > + err = set_vcpu_timer_tick(cs, hw_ticks_at_paused);
> > + if (err) {
> > + error_report("Resume vcpu timer tick failed: %d", err);
> > + }
> > + }
> > + break;
> > + default:
> > + break;
> > + }
> > +
> > + pre_state = state;
> > +}
> > +
> > static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
> > {
> > CPUState *cs = CPU(dev);
> > @@ -906,6 +960,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error
> > **errp)
> > Error *local_err = NULL;
> > bool no_aa32 = false;
> >
> > + /*
> > + * Only add change state handler for arch timer once, for KVM will
> > help to
> > + * synchronize virtual timer of all VCPUs.
> > + */
> > + static bool arch_timer_change_state_handler_added;
> > +
> > /* If we needed to query the host kernel for the CPU features
> > * then it's possible that might have failed in the initfn, but
> > * this is the first point where we can report it.
> > @@ -1181,6 +1241,11 @@ static void arm_cpu_realizefn(DeviceState *dev,
> > Error **errp)
> >
> > init_cpreg_list(cpu);
> >
> > + if (!arch_timer_change_state_handler_added && kvm_enabled()) {
> > + qemu_add_vm_change_state_handler(arch_timer_change_state_handler,
> > cs);
> > + arch_timer_change_state_handler_added = true;
> > + }
> > +
> > #ifndef CONFIG_USER_ONLY
> > if (cpu->has_el3 || arm_feature(env, ARM_FEATURE_M_SECURITY)) {
> > cs->num_ases = 2;
> > --
> > 1.8.3.1
- [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Heyi Guo, 2019/03/12
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, no-reply, 2019/03/12
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Heyi Guo, 2019/03/12
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Peter Maydell, 2019/03/12
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop,
Christoffer Dall <=
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Marc Zyngier, 2019/03/12
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Steven Price, 2019/03/13
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Heyi Guo, 2019/03/14
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Christoffer Dall, 2019/03/15
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Heyi Guo, 2019/03/19
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Steven Price, 2019/03/20
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Heyi Guo, 2019/03/26
- Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop, Heyi Guo, 2019/03/26