[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region
From: |
Alexey Korolev |
Subject: |
Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region |
Date: |
Fri, 22 Feb 2013 16:03:56 +1300 |
On Wed, Feb 20, 2013 at 4:20 AM, Michael S. Tsirkin <address@hidden> wrote:
> apic overlaps PCI space. On real hardware it has
> higher priority, emulate this correctly.
>
> This should addresses the following issue:
>
>> Subject: Re: [BUG] Guest OS hangs on boot when 64bit BAR present
>> (kvm-apic-msi resource conflict)
>> Sometime ago I reported an issue about guest OS hang when 64bit BAR present.
>> http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg03189.html
>> http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg00413.html
>>
>> Some more investigation has been done, so in this post I'll try to explain
>> why it happens and offer possible solutions:
>>
>> *When the issue happens*
>> The issue occurs on Linux guest OS if kernel version <2.6.36
>> A Guest OS hangs on boot when a 64bit PCI BAR is present in a system (if we
>> use ivshmem driver for example) and occupies range within first
>> 4 GB.
>>
>> *How to reproduce*
>> I used the following qemu command to reproduce the case:
>> /usr/local/bin/qemu-system-x86_64 -M pc-1.3 -enable-kvm -m 2000 -smp
>> 1,sockets=1,cores=1,threads=1 -name Rh5332 -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/Rh5332.monitor,server,nowait
>> -mon chardev=charmonitor,id=monitor,mode=readline -rtc
>> base=utc -boot cd -drive
>> file=/home/akorolev/rh5332.img,if=none,id=drive-ide0-0-0,format=raw -device
>> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev
>> file,id=charserial0,path=/home/akorolev/serial.log -device
>> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us
>> -vga cirrus -device ivshmem,shm,size=32M-device
>> virtio-balloon-pci,id=balloon0
>>
>> Tried different guests: Centos 5.8 64bit, RHEL 5.3 32bit, FC 12 64bit on all
>> machines hang occurs in 100% cases
>>
>> *Why it happens*
>> The issue basically comes from Linux PCI enumeration code.
>>
>> The OS enumerates 64BIT bars when device is enabled using the following
>> procedure.
>> 1. Write all FF's to lower half of 64bit BAR
>> 2. Write address back to lower half of 64bit BAR
>> 3. Write all FF's to higher half of 64bit BAR
>> 4. Write address back to higher half of 64bit BAR
>>
>> For qemu it means that qemu pci_default_write_config() recevies all FFs for
>> lower part of the 64bit BAR.
>> Then it applies the mask and converts the value to "All FF's - size + 1"
>> (FE000000 if size is 32MB).
>>
>> So for short period of time the range [0xFE000000 - 0xFFFFFFFF] will be
>> occupied by ivshmem resource.
>> For some reason it is lethal for further boot process.
>>
>> We have found that boot process screws up completely if kvm-apic-msi range
>> is overlapped even for short period of time. (We still don't
>> know why it happens, hope that the qemu maintainers can answer?)
>>
>> If we look at kvm-apic-msi memory region it is a non-overlapable memory
>> region with hardcoded address range [0xFEE00000 - 0xFEF00000].
>>
>> Here is a log we collected from render_memory_regions:
>>
>> system overlap 0 pri 0 [0x0 - 0x7fffffffffffffff]
>> kvmvapic-rom overlap 1 pri 1000 [0xca000 - 0xcd000]
>> pc.ram overlap 0 pri 0 [0xca000 - 0xcd000]
>> ++ pc.ram [0xca000 - 0xcd000] is added to view
>> ....................
>> smram-region overlap 1 pri 1 [0xa0000 - 0xc0000]
>> pci overlap 0 pri 0 [0xa0000 - 0xc0000]
>> cirrus-lowmem-container overlap 1 pri 1 [0xa0000 - 0xc0000]
>> cirrus-low-memory overlap 0 pri 0 [0xa0000 - 0xc0000]
>> ++cirrus-low-memory [0xa0000 - 0xc0000] is added to view
>> kvm-ioapic overlap 0 pri 0 [0xfec00000 - 0xfec01000]
>> ++kvm-ioapic [0xfec00000 - 0xfec01000] is added to view
>> pci-hole64 overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
>> pci overlap 0 pri 0 [0x100000000 - 0x4000000100000000]
>> pci-hole overlap 0 pri 0 [0x7d000000 - 0x100000000]
>> pci overlap 0 pri 0 [0x7d000000 - 0x100000000]
>> ivshmem-bar2-container overlap 1 pri 1 [0xfe000000 -
>> 0x100000000]
>> ivshmem.bar2 overlap 0 pri 0 [0xfe000000 - 0x100000000]
>> ++ivshmem.bar2 [0xfe000000 - 0xfec00000] is added to view
>> ++ivshmem.bar2 [0xfec01000 - 0x100000000] is added to view
>> ivshmem-mmio overlap 1 pri 1 [0xfebf1000 - 0xfebf1100]
>> e1000-mmio overlap 1 pri 1 [0xfeba0000 - 0xfebc0000]
>> cirrus-mmio overlap 1 pri 1 [0xfebf0000 - 0xfebf1000]
>> cirrus-pci-bar0 overlap 1 pri 1 [0xfa000000 - 0xfc000000]
>> vga.vram overlap 1 pri 1 [0xfa000000 - 0xfa800000]
>> ++vga.vram [0xfa000000 - 0xfa800000] is added to view
>> cirrus-bitblt-mmio overlap 0 pri 0 [0xfb000000 - 0xfb400000]
>> ++cirrus-bitblt-mmio [0xfb000000 - 0xfb400000] is added to
>> view
>> cirrus-linear-io overlap 0 pri 0 [0xfa000000 - 0xfa800000]
>> pc.bios overlap 0 pri 0 [0xfffe0000 - 0x100000000]
>> ram-below-4g overlap 0 pri 0 [0x0 - 0x7d000000]
>> pc.ram overlap 0 pri 0 [0x0 - 0x7d000000]
>> ++pc.ram [0x0 - 0xa0000] is added to view
>> ++pc.ram [0x100000 - 0x7d000000] is added to view
>> kvm-apic-msi overlap 0 pri 0 [0xfee00000 - 0xfef00000]
>>
>> As you can see from log the kvm-apic-msi is enumarated last when range
>> [0xfee00000 - 0xfef00000] is already occupied by ivshmem.bar2
>> [0xfec01000 - 0x100000000].
>>
>>
>> *Possible solutions*
>> Solution 1. Probably the best would be adding the rule that regions which
>> may not be overlapped are added to view first (In in other words
>> regions which must not be overlapped have the highest priority). Please
>> find patch in the following message.
>>
>> Solution 2. Raise priority of kvm-apic-msi resource. This is a bit
>> misleading solution, as priority is only applicable for overlap-able
>> regions, but this region must not be overlapped.
>>
>> Solution 3. Fix the issue at PCI level. Track if the resource is 64bit and
>> apply changes if both parts of 64bit BAR are programmed. (It
>> appears that real PCI bus controllers are smart enough to track 64bit BAR
>> writes on PC, so qemu could do the same? Drawbacks are that
>> tracking PCI writes is bit cumbersome, and such tracking may appear to
>> somebody as a hack)
>
> ---
>
> The following patch is under test ATM.
> Alexey, does it address the crash for you?
>
>
> diff --git a/hw/pc.c b/hw/pc.c
> index 53cc173..745c89d 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -1154,7 +1154,8 @@ void ioapic_init_gsi(GSIState *gsi_state, const char
> *parent_name)
> }
> qdev_init_nofail(dev);
> d = SYS_BUS_DEVICE(dev);
> - sysbus_mmio_map(d, 0, 0xfec00000);
> + /* APIC overlaps the PCI window. */
> + sysbus_mmio_map_overlap(d, 0, 0xfec00000, 1000);
>
> for (i = 0; i < IOAPIC_NUM_PINS; i++) {
> gsi_state->ioapic_irq[i] = qdev_get_gpio_in(dev, i);
> diff --git a/hw/sysbus.c b/hw/sysbus.c
> index 6d9d1df..40bf352 100644
> --- a/hw/sysbus.c
> +++ b/hw/sysbus.c
> @@ -66,6 +66,26 @@ void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
> dev->mmio[n].memory);
> }
>
> +void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
> + unsigned priority)
> +{
> + assert(n >= 0 && n < dev->num_mmio);
> +
> + if (dev->mmio[n].addr == addr) {
> + /* ??? region already mapped here. */
> + return;
> + }
> + if (dev->mmio[n].addr != (hwaddr)-1) {
> + /* Unregister previous mapping. */
> + memory_region_del_subregion(get_system_memory(),
> dev->mmio[n].memory);
> + }
> + dev->mmio[n].addr = addr;
> + memory_region_add_subregion_overlap(get_system_memory(),
> + addr,
> + dev->mmio[n].memory,
> + priority);
> +}
> +
>
> /* Request an IRQ source. The actual IRQ object may be populated later. */
> void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p)
> diff --git a/hw/sysbus.h b/hw/sysbus.h
> index a7fcded..2100bd7 100644
> --- a/hw/sysbus.h
> +++ b/hw/sysbus.h
> @@ -56,6 +56,8 @@ void sysbus_init_ioports(SysBusDevice *dev, pio_addr_t
> ioport, pio_addr_t size);
>
> void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq);
> void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr);
> +void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
> + unsigned priority);
> void sysbus_add_memory(SysBusDevice *dev, hwaddr addr,
> MemoryRegion *mem);
> void sysbus_add_memory_overlap(SysBusDevice *dev, hwaddr addr,
>
Sorry for late response. It looks I replied just to Michael by occasion.
This patch doesn't fix the problem as KVM APIC should have high
priority (IO APIC is fine).
This patch fixes the issue: (I can send it in a new thread to make
apply process clear)
--------
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index dfcf86e..e66bf54 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2078,7 +2078,7 @@ static void x86_cpu_apic_init(X86CPU *cpu, Error **errp)
/* NOTE: the APIC is directly connected to the CPU - it is not
on the global memory bus. */
/* XXX: what if the base changes? */
- sysbus_mmio_map(SYS_BUS_DEVICE(env->apic_state), 0, MSI_ADDR_BASE);
+ sysbus_mmio_map_overlap(SYS_BUS_DEVICE(env->apic_state), 0,
MSI_ADDR_BASE, 0x1000);
apic_mapped = 1;
}
}
diff --git a/hw/sysbus.c b/hw/sysbus.c
index 6d9d1df..50c7232 100644
--- a/hw/sysbus.c
+++ b/hw/sysbus.c
@@ -48,7 +48,8 @@ void sysbus_connect_irq(SysBusDevice *dev, int n,
qemu_irq irq)
}
}
-void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
+static void sysbus_mmio_map_common(SysBusDevice *dev, int n, hwaddr addr,
+ bool may_overlap, unsigned priority)
{
assert(n >= 0 && n < dev->num_mmio);
@@ -61,11 +62,29 @@ void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
memory_region_del_subregion(get_system_memory(), dev->mmio[n].memory);
}
dev->mmio[n].addr = addr;
- memory_region_add_subregion(get_system_memory(),
- addr,
- dev->mmio[n].memory);
+ if (may_overlap) {
+ memory_region_add_subregion_overlap(get_system_memory(),
+ addr,
+ dev->mmio[n].memory,
+ priority);
+ }
+ else {
+ memory_region_add_subregion(get_system_memory(),
+ addr,
+ dev->mmio[n].memory);
+ }
}
+void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
+{
+ sysbus_mmio_map_common(dev, n, addr, false, 0);
+}
+
+void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
+ unsigned priority)
+{
+ sysbus_mmio_map_common(dev, n, addr, true, priority);
+}
/* Request an IRQ source. The actual IRQ object may be populated later. */
void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p)
diff --git a/hw/sysbus.h b/hw/sysbus.h
index a7fcded..2100bd7 100644
--- a/hw/sysbus.h
+++ b/hw/sysbus.h
@@ -56,6 +56,8 @@ void sysbus_init_ioports(SysBusDevice *dev,
pio_addr_t ioport, pio_addr_t size);
void sysbus_connect_irq(SysBusDevice *dev, int n, qemu_irq irq);
void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr);
+void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
+ unsigned priority);
void sysbus_add_memory(SysBusDevice *dev, hwaddr addr,
MemoryRegion *mem);
void sysbus_add_memory_overlap(SysBusDevice *dev, hwaddr addr,
- Re: [Qemu-devel] [BUG] Guest OS hangs on boot when 64bit BAR present (kvm-apic -msi resource conflict), (continued)
[Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region, Michael S. Tsirkin, 2013/02/19
- Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region, Jan Kiszka, 2013/02/19
- Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region, Peter Maydell, 2013/02/19
- Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region, Michael S. Tsirkin, 2013/02/19
- Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region, Jan Kiszka, 2013/02/19
- Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region, Peter Maydell, 2013/02/19
- Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region, Jan Kiszka, 2013/02/19
Re: [Qemu-devel] [PATCH for-1.4] pc: tag apic as overlap region,
Alexey Korolev <=