[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-arm] [RFC] arm/cpu: fix soft lockup panic after resuming from

From: Heyi Guo
Subject: Re: [Qemu-arm] [RFC] arm/cpu: fix soft lockup panic after resuming from stop
Date: Wed, 13 Mar 2019 09:57:06 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1

Dear all,

Really appreciate your comments and information. For "Live Physical Time", is 
there any document to tell the whole story? And may I know the timeline to make it happen?



On 2019/3/12 22:12, Marc Zyngier wrote:
Hi Peter,

On 12/03/2019 10:08, Peter Maydell wrote:
On Tue, 12 Mar 2019 at 06:10, Heyi Guo <address@hidden> wrote:
When we stop a VM for more than 30 seconds and then resume it, by qemu
monitor command "stop" and "cont", Linux on VM will complain of "soft
lockup - CPU#x stuck for xxs!" as below:

[ 2783.809517] watchdog: BUG: soft lockup - CPU#3 stuck for 2395s!
[ 2783.809559] watchdog: BUG: soft lockup - CPU#2 stuck for 2395s!
[ 2783.809561] watchdog: BUG: soft lockup - CPU#1 stuck for 2395s!
[ 2783.809563] Modules linked in...

This is because Guest Linux uses generic timer virtual counter as
a software watchdog, and CNTVCT_EL0 does not stop when VM is stopped
by qemu.

This patch is to fix this issue by saving the value of CNTVCT_EL0 when
stopping and restoring it when resuming.
Hi -- I know we have issues with the passage of time in Arm VMs
running under KVM when the VM is suspended, but the topic is
a tricky one, and it's not clear to me that this is the correct
way to fix it. I would prefer to see us start with a discussion
on the kvm-arm mailing list about the best approach to the problem.

I've cc'd that list and a couple of the Arm KVM maintainers
for their opinion.

QEMU patch left below for context -- the brief summary is that
it uses KVM_GET_ONE_REG/KVM_SET_ONE_REG on the timer CNT register
to save it on VM pause and write that value back on VM resume.
That's probably good enough for this particular use case, but I think
there is more. I can get into similar trouble if I suspend my laptop, or
suspend QEMU. It also has the slightly bizarre effect of skewing time,
and this will affect timekeeping in the guest in ways that are much more
subtle than just shouty CPUs.

Christoffer and Steve had some stuff regarding Live Physical Time, which
should cover that, and other cases such as host system suspend, and QEMU
being suspended.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]