[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem
From: |
Xulei (Stone) |
Subject: |
Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform |
Date: |
Thu, 19 Nov 2015 01:04:19 +0000 |
Dear Kevin,
Sorry for delayed replying. This patch works for me well. Thanks a lot!
Recently, I found another odd thing. A qemu-kvm VM is stuck at the SeaBIOS
after self-rebooting many times. Analyzing the SeaBIOS log attached below, I
think there maybe someting wrong from this block of code:
/src/fw/smp.c
u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
while (cmos_smp_count != CountCPUs)
asm volatile(
// Release lock and allow other processors to use the stack.
" movl %%esp, %1\n"
" movl $0, %0\n"
// Reacquire lock and take back ownership of stack.
"1:rep ; nop\n"
" lock btsl $0, %0\n"
" jc 1b\n"
: "+m" (SMPLock), "+m" (SMPStack)
: : "cc", "memory");
yield();
It seems if SeaBIOS read an incorrect number sometimes from QEMU
through cmos 0x5f,the SeaBIOS really may be stucked. So, i wonder
what may cause this problem after a VM self-rebooting many times?
================bad SeaBIOS log===========
[2015-11-13 18:45:58] In resume (status=0)
[2015-11-13 18:45:58] In 32bit resume
[2015-11-13 18:45:58] Attempting a hard reboot
[2015-11-13 18:46:00] SeaBIOS (version
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org)
[2015-11-13 18:46:00] No Xen hypervisor found.
[2015-11-13 18:46:00] Running on QEMU (i440fx)
[2015-11-13 18:46:00] Running on KVM
[2015-11-13 18:46:00] RamSize: 0xc0000000 [cmos]
[2015-11-13 18:46:00] Relocating init from 0x000de8f0 to 0xbffaec00 (size 70464)
[2015-11-13 18:46:00] Found QEMU fw_cfg
[2015-11-13 18:46:00] RamBlock: addr 0x0000000000000000 len 0x00000000c0000000
[e820]
[2015-11-13 18:46:00] RamBlock: addr 0x0000000100000000 len 0x0000000340000000
[e820]
[2015-11-13 18:46:00] Moving pm_base to 0x600
[2015-11-13 18:46:00] boot order:
[2015-11-13 18:46:00] 1: /address@hidden/address@hidden/address@hidden,0
[2015-11-13 18:46:00] 2: HALT
[2015-11-13 18:46:00] CPU Mhz=2402
[2015-11-13 18:46:00] === PCI bus & bridge init ===
[2015-11-13 18:46:00] PCI: pci_bios_init_bus_rec bus = 0x0
[2015-11-13 18:46:00] === PCI device probing ===
[2015-11-13 18:46:00] Found 21 PCI devices (max PCI bus is 00)
[2015-11-13 18:46:00] === PCI new allocation pass #1 ===
[2015-11-13 18:46:00] PCI: check devices
[2015-11-13 18:46:00] === PCI new allocation pass #2 ===
[2015-11-13 18:46:00] PCI: IO: c000 - c1cf
[2015-11-13 18:46:00] PCI: 32: 00000000c0000000 - 00000000fec00000
[2015-11-13 18:46:00] PCI: map device bdf=00:1f.0 bar 0, addr 0000c000, size
00000100 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:0e.0 bar 0, addr 0000c100, size
00000040 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:0f.0 bar 0, addr 0000c140, size
00000040 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:01.2 bar 4, addr 0000c180, size
00000020 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:0d.0 bar 0, addr 0000c1a0, size
00000020 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:01.1 bar 4, addr 0000c1c0, size
00000010 [io]
[2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 6, addr febe0000, size
00010000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 1, addr febf0000, size
00001000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0d.0 bar 1, addr febf1000, size
00001000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0e.0 bar 1, addr febf2000, size
00001000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0f.0 bar 1, addr febf3000, size
00001000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:10.0 bar 0, addr febf4000, size
00001000 [mem]
[2015-11-13 18:46:00] PCI: map device bdf=00:02.0 bar 0, addr f6000000, size
02000000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:10.0 bar 2, addr f8000000, size
01000000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:03.0 bar 2, addr f9000000, size
00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:04.0 bar 2, addr f9800000, size
00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:05.0 bar 2, addr fa000000, size
00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:06.0 bar 2, addr fa800000, size
00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:07.0 bar 2, addr fb000000, size
00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:08.0 bar 2, addr fb800000, size
00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:09.0 bar 2, addr fc000000, size
00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0a.0 bar 2, addr fc800000, size
00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0b.0 bar 2, addr fd000000, size
00800000 [prefmem]
[2015-11-13 18:46:00] PCI: map device bdf=00:0c.0 bar 2, addr fd800000, size
00800000 [prefmem]
[2015-11-13 18:46:00] PCI: init bdf=00:00.0 id=8086:1237
[2015-11-13 18:46:00] PCI: init bdf=00:01.0 id=8086:7000
[2015-11-13 18:46:00] PIIX3/PIIX4 init: elcr=00 0c
[2015-11-13 18:46:00] PCI: init bdf=00:01.1 id=8086:7010
[2015-11-13 18:46:00] PCI: init bdf=00:01.2 id=8086:7020
[2015-11-13 18:46:00] PCI: init bdf=00:01.3 id=8086:7113
[2015-11-13 18:46:00] Using pmtimer, ioport 0x608
[2015-11-13 18:46:00] PCI: init bdf=00:02.0 id=1013:00b8
[2015-11-13 18:46:00] PCI: init bdf=00:03.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:04.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:05.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:06.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:07.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:08.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:09.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:0a.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:0b.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:0c.0 id=15b3:1004
[2015-11-13 18:46:00] PCI: init bdf=00:0d.0 id=1af4:1003
[2015-11-13 18:46:00] PCI: init bdf=00:0e.0 id=1af4:1001
[2015-11-13 18:46:00] PCI: init bdf=00:0f.0 id=1af4:1001
[2015-11-13 18:46:00] PCI: init bdf=00:10.0 id=1af4:1110
[2015-11-13 18:46:00] PCI: init bdf=00:1f.0 id=1af4:8888
[2015-11-13 18:46:00] PCI: Using 00:02.0 for primary VGA
[2015-11-13 18:46:00] handle_smp: apic_id=1
[2015-11-13 18:46:00] handle_smp: apic_id=6
[2015-11-13 18:46:00] handle_smp: apic_id=7
[2015-11-13 18:46:00] handle_smp: apic_id=3
[2015-11-13 18:46:00] handle_smp: apic_id=2
[2015-11-13 18:46:00] handle_smp: apic_id=5
[2015-11-13 18:46:00] handle_smp: apic_id=4
========The End, nothing more======
>On Mon, Nov 09, 2015 at 03:06:18PM -0500, Kevin O'Connor wrote:
>> On Mon, Nov 09, 2015 at 08:32:53AM -0500, Kevin O'Connor wrote:
>> > On Fri, Nov 06, 2015 at 09:12:34AM +0000, Xulei (Stone) wrote:
>> > > >On Wed, Nov 04, 2015 at 08:48:20AM +0800, Gonglei wrote:
>> > > >I'm surprised you would see the above on a recent qemu/kvm though - as
>> > > >on a newer KVM I think the second reset would have to happen after
>> > > >HaveAttemptedReboot is set and prior to the memcpy in
>> > > >qemu_prep_reset() completing. Can you verify your KVM version?
>> > >
>> > > I've tested on KVM-3.6 and KVM-4.1.3. On both of these versions, i can
>> > > see this problem.
>> > > I do like this: put a HA and a watchdog mechanism in a VM. Deliberately,
>> > > let this VM lose heartbeat and don't feed dog. Then, after 2 minutes,
>> > > a self-defined timeout, HA mechnism will issue a internal reboot command
>> > > to
>> > > the VM and watchdog mechanism will issue a "virsh reset" from the host.
>> > > Then,
>> > > aforementioned problem will occurs in high probability.
>> >
>> > Ah, okay. I'm not sure what the best solution to this problem is.
>>
>> After thinking about this further, I think we can move the
>> HaveAttemptedReboot assignment after the memcpy.
>
>The previous patch could cause corruption if the memcpy() failed. I
>think the new SeaBIOS patch below should be okay though.
>
>-Kevin
>
>
>commit 8a6e44ad5c953266d2339b3299f5fb4ff32c8cbb
>Author: Kevin O'Connor <address@hidden>
>Date: Mon Nov 9 15:00:19 2015 -0500
>
> resume: Make KVM soft reboot loop detection more flexible
>
> Move the check for soft reboot loops from resume.c to shadow.c and
> directly check for the case where the memcpy fails. This prevents a
> hang if an external reboot request occurs during the BIOS memcpy.
>
> Signed-off-by: Kevin O'Connor <address@hidden>
>
>diff --git a/src/fw/shadow.c b/src/fw/shadow.c
>index ee87d36..b2f2dd8 100644
>--- a/src/fw/shadow.c
>+++ b/src/fw/shadow.c
>@@ -156,6 +156,8 @@ make_bios_readonly(void)
> make_bios_readonly_intel(ShadowBDF, Q35_HOST_BRIDGE_PAM0);
> }
>
>+static u8 AttemptingReboot;
>+
> void
> qemu_prep_reset(void)
> {
>@@ -164,6 +166,19 @@ qemu_prep_reset(void)
> // QEMU doesn't map 0xc0000-0xfffff back to the original rom on a
> // reset, so do that manually before invoking a hard reset.
> make_bios_writable();
>+ AttemptingReboot = 1;
>+ barrier();
>+ if (!AttemptingReboot)
>+ goto fail;
>+ barrier();
> memcpy(VSYMBOL(code32flat_start), VSYMBOL(code32flat_start) +
> BIOS_SRC_OFFSET
> , SYMBOL(code32flat_end) - SYMBOL(code32flat_start));
>+ barrier();
>+ if (AttemptingReboot)
>+ goto fail;
>+ return;
>+fail:
>+ // Attempt to restore code has failed - try to shutdown machine.
>+ dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n");
>+ apm_shutdown();
> }
>diff --git a/src/resume.c b/src/resume.c
>index a5465d8..afeadcf 100644
>--- a/src/resume.c
>+++ b/src/resume.c
>@@ -114,19 +114,10 @@ s3_resume(void)
> farcall16big(&br);
> }
>
>-u8 HaveAttemptedReboot VARLOW;
>-
> // Attempt to invoke a hard-reboot.
> static void
> tryReboot(void)
> {
>- if (HaveAttemptedReboot) {
>- // Hard reboot has failed - try to shutdown machine.
>- dprintf(1, "Unable to hard-reboot machine - attempting shutdown.\n");
>- apm_shutdown();
>- }
>- HaveAttemptedReboot = 1;
>-
> dprintf(1, "Attempting a hard reboot\n");
>
> // Setup for reset on qemu.
- [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform, Xulei (Stone, Euler), 2015/11/03
- Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform, Gonglei, 2015/11/03
- Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform, Kevin O'Connor, 2015/11/04
- Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform, Xulei (Stone), 2015/11/06
- Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform, Kevin O'Connor, 2015/11/09
- Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform, Kevin O'Connor, 2015/11/09
- Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform, Kevin O'Connor, 2015/11/09
- Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform,
Xulei (Stone) <=
- Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform, Xulei (Stone), 2015/11/19
- Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform, Kevin O'Connor, 2015/11/19
- Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform, Xulei (Stone), 2015/11/19