qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region


From: Michael S. Tsirkin
Subject: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region
Date: Tue, 19 Feb 2013 17:20:57 +0200

apic overlaps PCI space. On real hardware it has
higher priority, emulate this correctly.

This should addresses the following issue:

> Subject: Re: [BUG] Guest OS hangs on boot when 64bit BAR present 
> (kvm-apic-msi resource conflict)
> Sometime ago I reported an issue about guest OS hang when 64bit BAR present.
> http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg03189.html
> http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg00413.html
> 
> Some more investigation has been done, so in this post I'll try to explain 
> why it happens and offer possible solutions:
> 
> *When the issue happens*
> The issue occurs on Linux guest OS if kernel version <2.6.36
> A Guest OS hangs on boot when a 64bit PCI BAR is present in a system (if we 
> use ivshmem driver for example) and occupies range within first
> 4 GB.
> 
> *How to reproduce*
> I used the following qemu command to reproduce the case:
> /usr/local/bin/qemu-system-x86_64 -M pc-1.3 -enable-kvm -m 2000 -smp 
> 1,sockets=1,cores=1,threads=1 -name Rh5332 -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/Rh5332.monitor,server,nowait 
> -mon chardev=charmonitor,id=monitor,mode=readline -rtc
> base=utc -boot cd -drive 
> file=/home/akorolev/rh5332.img,if=none,id=drive-ide0-0-0,format=raw -device
> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev 
> file,id=charserial0,path=/home/akorolev/serial.log -device
> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga 
> cirrus -device ivshmem,shm,size=32M-device
> virtio-balloon-pci,id=balloon0
> 
> Tried different guests: Centos 5.8 64bit, RHEL 5.3 32bit, FC 12 64bit on all 
> machines hang occurs in 100% cases
> 
> *Why it happens*
> The issue basically comes from Linux PCI enumeration code.
> 
> The OS enumerates 64BIT bars when device is enabled using the following 
> procedure.
> 1. Write all FF's to lower half of 64bit BAR
> 2. Write address back to lower half of 64bit BAR
> 3. Write all FF's to higher half of 64bit BAR
> 4. Write address back to higher half of 64bit BAR
> 
> For qemu it means that  qemu pci_default_write_config() recevies all FFs for 
> lower part of the 64bit BAR.
> Then it applies the mask and converts the value to "All FF's - size + 1" 
> (FE000000 if size is 32MB).
> 
> So for short period of time the range [0xFE000000 - 0xFFFFFFFF] will be 
> occupied by ivshmem resource.
> For some reason it is lethal for further boot process.
> 
> We have found that boot process screws up completely if kvm-apic-msi range is 
> overlapped even for short period of time.  (We still don't
> know why it happens, hope that the qemu maintainers can answer?)
> 
> If we look at kvm-apic-msi memory region it is a non-overlapable memory 
> region with hardcoded address range [0xFEE00000 - 0xFEF00000].
> 
> Here is a log we collected from render_memory_regions:
> 
>  system overlap 0 pri 0 [0x0 - 0x7fffffffffffffff]
>      kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000]
>          pc.ram overlap 0 pri 0 [0xca000 - 0xcd000]
>          ++ pc.ram [0xca000 - 0xcd000] is added to view
>      ....................
>      smram-region overlap 1 pri 1 [0xa0000 - 0xc0000]
>          pci overlap 0 pri 0 [0xa0000 - 0xc0000]
>              cirrus-lowmem-container overlap 1 pri 1 [0xa0000 - 0xc0000]
>                  cirrus-low-memory overlap 0 pri 0 [0xa0000 - 0xc0000]
>                 ++cirrus-low-memory [0xa0000 - 0xc0000] is added to view
>      kvm-ioapic overlap 0 pri 0 [0xfec00000 - 0xfec01000]
>     ++kvm-ioapic [0xfec00000 - 0xfec01000] is added to view
>      pci-hole64 overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
>          pci overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
>      pci-hole overlap 0 pri 0 [0x7d000000 - 0x100000000]
>          pci overlap 0 pri 0 [0x7d000000 - 0x100000000]
>              ivshmem-bar2-container overlap 1 pri 1 [0xfe000000 - 0x100000000]
>                  ivshmem.bar2 overlap 0 pri 0 [0xfe000000 - 0x100000000]
>                 ++ivshmem.bar2 [0xfe000000 - 0xfec00000] is added to view
>                 ++ivshmem.bar2  [0xfec01000 - 0x100000000] is added to view
>              ivshmem-mmio overlap 1 pri 1 [0xfebf1000 - 0xfebf1100]
>              e1000-mmio overlap 1 pri 1 [0xfeba0000 - 0xfebc0000]
>              cirrus-mmio overlap 1 pri 1 [0xfebf0000 - 0xfebf1000]
>              cirrus-pci-bar0 overlap 1 pri 1 [0xfa000000 - 0xfc000000]
>                  vga.vram overlap 1 pri 1 [0xfa000000 - 0xfa800000]
>                 ++vga.vram [0xfa000000 - 0xfa800000] is added to view
>                  cirrus-bitblt-mmio overlap 0 pri 0 [0xfb000000 - 0xfb400000]
>                 ++cirrus-bitblt-mmio [0xfb000000 - 0xfb400000] is added to 
> view
>                  cirrus-linear-io overlap 0 pri 0 [0xfa000000 - 0xfa800000]
>              pc.bios overlap 0 pri 0 [0xfffe0000 - 0x100000000]
>      ram-below-4g overlap 0 pri 0 [0x0 - 0x7d000000]
>          pc.ram overlap 0 pri 0 [0x0 - 0x7d000000]
>         ++pc.ram [0x0 - 0xa0000] is added to view
>         ++pc.ram [0x100000 - 0x7d000000] is added to view
>      kvm-apic-msi overlap 0 pri 0 [0xfee00000 - 0xfef00000]
> 
> As you can see from log the kvm-apic-msi is enumarated last when range 
> [0xfee00000 - 0xfef00000] is already occupied by ivshmem.bar2
> [0xfec01000 - 0x100000000].
> 
> 
> *Possible solutions*
> Solution 1. Probably the best would be adding the rule that regions which may 
> not be overlapped are added to view first (In in other words
> regions which must not be overlapped have the highest priority).  Please find 
> patch in the following message.
> 
> Solution 2. Raise priority of kvm-apic-msi resource. This is a bit misleading 
> solution, as priority is only applicable for overlap-able
> regions, but this region must not be overlapped.
> 
> Solution 3. Fix the issue at PCI level. Track if the resource is 64bit and 
> apply changes if both parts of 64bit BAR are programmed. (It
> appears that real PCI bus controllers are smart enough to track 64bit BAR 
> writes on PC, so qemu could do the same? Drawbacks are that
> tracking PCI writes is bit cumbersome, and such tracking may appear to 
> somebody as a hack)

---

The following patch is under test ATM.
Alexey, does it address the crash for you?


diff --git a/hw/pc.c b/hw/pc.c
index 53cc173..745c89d 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1154,7 +1154,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
*parent_name)
     }
     qdev_init_nofail(dev);
     d = SYS_BUS_DEVICE(dev);
-    sysbus_mmio_map(d, 0, 0xfec00000);
+    /* APIC overlaps the PCI window. */
+    sysbus_mmio_map_overlap(d, 0, 0xfec00000, 1000);
 
     for (i = 0; i < IOAPIC_NUM_PINS; i++) {
         gsi_state->ioapic_irq[i] = qdev_get_gpio_in(dev, i);
diff --git a/hw/sysbus.c b/hw/sysbus.c
index 6d9d1df..40bf352 100644
--- a/hw/sysbus.c
+++ b/hw/sysbus.c
@@ -66,6 +66,26 @@ void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
                                 dev->mmio[n].memory);
 }
 
+void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
+                             unsigned priority)
+{
+    assert(n >= 0 && n < dev->num_mmio);
+
+    if (dev->mmio[n].addr == addr) {
+        /* ??? region already mapped here.  */
+        return;
+    }
+    if (dev->mmio[n].addr != (hwaddr)-1) {
+        /* Unregister previous mapping.  */
+        memory_region_del_subregion(get_system_memory(), dev->mmio[n].memory);
+    }
+    dev->mmio[n].addr = addr;
+    memory_region_add_subregion_overlap(get_system_memory(),
+                                        addr,
+                                        dev->mmio[n].memory,
+                                        priority);
+}
+
 
 /* Request an IRQ source.  The actual IRQ object may be populated later.  */
 void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p)
diff --git a/hw/sysbus.h b/hw/sysbus.h
index a7fcded..2100bd7 100644
--- a/hw/sysbus.h
+++ b/hw/sysbus.h
@@ -56,6 +56,8 @@ void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t 
ioport, pio_addr_t size);
 
 void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq);
 void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr);
+void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
+                             unsigned priority);
 void sysbus_add_memory(SysBusDevice *dev, hwaddr addr,
                        MemoryRegion *mem);
 void sysbus_add_memory_overlap(SysBusDevice *dev, hwaddr addr,



reply via email to

[Prev in Thread] Current Thread [Next in Thread]