qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is presen


From: Alexey Korolev
Subject: Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
Date: Thu, 26 Jan 2012 16:19:45 +1300
User-agent: Mozilla/5.0 (X11; Linux i686; rv:9.0) Gecko/20111229 Thunderbird/9.0

Hi Alex and Michael
>> For testing, I applied the following patch to qemu,
>> converting msix bar to 64 bit.
>> Guest did not seem to crash.
>> I booted Fedora Live CD 32 bit guest on a 32 bit host
>> to level 3 without crash, and verified that
>> the BAR is a 64 bit one, and that I got assigned an address
>> at fe000000.
>> command line I used:
>> qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive
>> file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe
>> -device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci
>> -cdrom Fedora-15-i686-Live-LXDE.iso
>>
>> At boot prompt type tab and add '3' to kernel command line
>> to have guest boot into a fast text console instead
>> of a graphical one which is very slow.
>>
>> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
>> index 2ac87ea..5271394 100644
>> --- a/hw/virtio-pci.c
>> +++ b/hw/virtio-pci.c
>> @@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
>> *vdev)
>>      memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
>>      if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
>>                                       &proxy->msix_bar, 1, 0)) {
>> -        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
>> +        pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY |
>> +                     PCI_BASE_ADDRESS_MEM_TYPE_64,
>>                           &proxy->msix_bar);
>>      } else
>>          vdev->nvectors = 0;
>>
> I was also able to add MEM64 BARs to device assignment pretty trivially
> and it seems to work, guest sees 64bit BARs for an 82576 VF, programs it
> to an fexxxxxx address and it works.
>
> Alex
>

I'd suggest using ivshmem with buffer size 32MB to reproduce the problem in 
2.6.18 guest for example.

The msix case is not failing because:
1. Buffer size is just 4KB - it will reprogram range from 0xFFFFE000-0xFFFFFFFF 
(it doesn't overlap critical resources to cause immediate panic)
2. The memory_region_init -function doesn't create backing user memory region. 
So kvm does nothing about remapping in this case.

If you apply the following patch and add to qemu command: --device 
ivshmem,size=32,shm="shm"
---
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 1aa9e3b..71f8c21 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int 
fd) {
     memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
 
     /* region for shared memory */
-    pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
+    pci_register_bar(&s->dev, 2, 
PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar)
 }
 
 static void close_guest_eventfds(IVShmemState *s, int posn)
---

You can get the following bootup log:


Bootdata ok (command line is root=/dev/hda1 console=ttyS0,115200n8 console=tty0)
Linux version 2.6.18 (address@hidden) (gcc version 4.1.2 20080704 (Red Hat 
4.1.2-48)) #3 SMP Tue Jan 17 16:37:33 NZDT 2012
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
 BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fffd000 (usable)
 BIOS-e820: 000000007fffd000 - 0000000080000000 (reserved)
 BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved)
 BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
DMI 2.4 present.
No NUMA configuration found
Faking a node at 0000000000000000-000000007fffd000
Bootmem setup node 0 0000000000000000-000000007fffd000
ACPI: PM-Timer IO Port: 0xb008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:2 APIC version 17
ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
Setting APIC routing to physical flat
ACPI: HPET id: 0x8086a201 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 88000000 (gap: 80000000:7effc000)
SMP: Allowing 1 CPUs, 0 hotplug CPUs
Built 1 zonelists.  Total pages: 515393
Kernel command line: root=/dev/hda1 console=ttyS0,115200n8 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
time.c: Using 100.000000 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2500.081 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Checking aperture...
Memory: 2058096k/2097140k available (3256k kernel code, 38656k reserved, 2266k 
data, 204k init)
Calibrating delay using timer specific routine.. 5030.07 BogoMIPS (lpj=10060155)
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
MCE: warning: using only 10 banks
SMP alternatives: switching to UP code
Freeing SMP alternatives: 36k freed
ACPI: Core revision 20060707
activating NMI Watchdog ... done.
Using local APIC timer interrupts.
result 62501506
Detected 62.501 MHz APIC timer.
Brought up 1 CPUs
testing NMI watchdog ... OK.
migration_cost=0
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
PCI quirk: region b000-b03f claimed by PIIX4 ACPI
PCI quirk: region b100-b10f claimed by PIIX4 SMB
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKS] (IRQs 9) *0, disabled.
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
divide error: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18 #3
RIP: 0010:[<ffffffff80388299>]  [<ffffffff80388299>] hpet_alloc+0x12a/0x30c
RSP: 0000:ffff81007e3a1e20  EFLAGS: 00010246
RAX: 00038d7ea4c68000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8057fc2b
RBP: ffff81007e2e28c0 R08: ffffffff8055b492 R09: ffff81007e39f510
R10: ffff81007e3a1e50 R11: 0000000000000098 R12: ffff81007e3a1e50
R13: 0000000000000000 R14: ffffffffff5fe000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510)
Stack:  0000000000000000 ffffffff80847470 0000000000000000 0000000000000000
 0000000000000000 ffffffff8081e187 00000000fed00000 ffffffffff5fe000
 0000000300010001 0000000800000002 0000000000000000 0000000000000000
Call Trace:
 [<ffffffff8081e187>] late_hpet_init+0xa7/0xb2
 [<ffffffff8020717f>] init+0x139/0x2fe
 [<ffffffff8020a5b4>] child_rip+0xa/0x12
DWARF2 unwinder stuck at child_rip+0xa/0x12
Leftover inexact backtrace:
 [<ffffffff803544b6>] acpi_ds_init_one_object+0x0/0x82
 [<ffffffff80207046>] init+0x0/0x2fe
 [<ffffffff8020a5aa>] child_rip+0x0/0x12


Code: 48 f7 f6 83 7d 30 01 8b 75 34 48 89 45 20 49 8b 4c 24 08 48
RIP  [<ffffffff80388299>] hpet_alloc+0x12a/0x30c
 RSP <ffff81007e3a1e20>
 <0>Kernel panic - not syncing: Attempted to kill init!
 NMI Watchdog detected LOCKUP on CPU 0
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18 #3
RIP: 0010:[<ffffffff8033fa93>]  [<ffffffff8033fa93>] __delay+0x6/0x10
RSP: 0000:ffff81007e3a1b50  EFLAGS: 00000293
RAX: 00000000000480f3 RBX: 0000000000000000 RCX: 000000008dea8c6a
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000265e28
RBP: 00000000000009b0 R08: 0000000000000000 R09: ffff8100010503d4
R10: 0000000000000001 R11: ffffffff8034e288 R12: 0000000000000000
R13: 000000000000000b R14: ffffffff8055bc9f R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510)
Stack:  ffffffff80230a09 0000003000000008 ffff81007e3a1c48 ffff81007e3a1b78
 0000000000000246 ffffffff8055bc9f 0000000000000246 ffff81007e39f510
 0000000000000000 0000000000000000 ffff8100010503d4 0000000000000000
Call Trace:
 [<ffffffff80230a09>] panic+0x12c/0x12f
 [<ffffffff802338c5>] do_exit+0x85/0x87b
 [<ffffffff8020b0df>] kernel_math_error+0x0/0x90

Code: 0f 31 29 c8 48 39 f8 72 f5 c3 65 8b 04 25 2c 00 00 00 48 98
console shuts up ...
 <0>Kernel panic - not syncing: Attempted to kill init!


Please look at HPET lines. HPET is mapped to 0xfed00000.
Size of ivshmem is 32MB. During pci enumeration ivshmem will corrupt the range 
from 0xfe000000 - 0xffffffff.
It overlaps HPET memory. When Linux does late_hpet init, it finds garbage and 
this is causing panic.

Thanks,
Alexey







reply via email to

[Prev in Thread] Current Thread [Next in Thread]