qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 02/10] pci-bridge/cxl_downstream: Add a CXL switch downstream


From: Jonathan Cameron
Subject: Re: [PULL 02/10] pci-bridge/cxl_downstream: Add a CXL switch downstream port
Date: Mon, 3 Apr 2023 17:12:32 +0100

On Fri, 24 Mar 2023 11:17:50 +0100
Thomas Huth <thuth@redhat.com> wrote:

> On 07/12/2022 14.26, Thomas Huth wrote:
> > On 07/12/2022 14.21, Jonathan Cameron wrote:  
> >> On Mon, 05 Dec 2022 14:59:39 +0000
> >> Alex Bennée <alex.bennee@linaro.org> wrote:
> >>  
> >>> Jonathan Cameron via <qemu-devel@nongnu.org> writes:
> >>>  
> >>>> On Mon, 5 Dec 2022 10:54:03 +0000
> >>>> Jonathan Cameron via <qemu-devel@nongnu.org> wrote:  
> >>>>> On Sun, 4 Dec 2022 08:23:55 +0100
> >>>>> Thomas Huth <thuth@redhat.com> wrote:  
> >>>>>> On 04/11/2022 07.47, Thomas Huth wrote:  
> >>>>>>> On 16/06/2022 18.57, Michael S. Tsirkin wrote:  
> >>>>>>>> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >>>>>>>>
> >>>>>>>> Emulation of a simple CXL Switch downstream port.
> >>>>>>>> The Device ID has been allocated for this use.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >>>>>>>> Message-Id: <20220616145126.8002-3-Jonathan.Cameron@huawei.com>
> >>>>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >>>>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>>>>>> ---
> >>>>>>>>    hw/cxl/cxl-host.c              |  43 +++++-
> >>>>>>>>    hw/pci-bridge/cxl_downstream.c | 249 
> >>>>>>>> +++++++++++++++++++++++++++++++++
> >>>>>>>>    hw/pci-bridge/meson.build      |   2 +-
> >>>>>>>>    3 files changed, 291 insertions(+), 3 deletions(-)
> >>>>>>>>    create mode 100644 hw/pci-bridge/cxl_downstream.c  
> >>>>>>>
> >>>>>>>    Hi!
> >>>>>>>
> >>>>>>> There is a memory problem somewhere in this new device. I can make 
> >>>>>>> QEMU
> >>>>>>> crash by running something like this:
> >>>>>>>
> >>>>>>> $ MALLOC_PERTURB_=59 ./qemu-system-x86_64 -M x-remote \
> >>>>>>>       -display none -monitor stdio
> >>>>>>> QEMU 7.1.50 monitor - type 'help' for more information
> >>>>>>> (qemu) device_add cxl-downstream
> >>>>>>> ./qemu/qom/object.c:1188:5: runtime error: member access within 
> >>>>>>> misaligned
> >>>>>>> address 0x3b3b3b3b3b3b3b3b for type 'struct Object', which requires 8 
> >>>>>>> byte
> >>>>>>> alignment
> >>>>>>> 0x3b3b3b3b3b3b3b3b: note: pointer points here
> >>>>>>> <memory cannot be printed>
> >>>>>>> Bus error (core dumped)
> >>>>>>>
> >>>>>>> Could you have a look if you've got some spare minutes?  
> >>>>>>
> >>>>>> Ping! Jonathan, Michael, any news on this bug?
> >>>>>>
> >>>>>> (this breaks one of my local tests, that's why it's annoying for me)  
> >>>>> Sorry, my email filters ate your earlier message.
> >>>>>
> >>>>> Looking into this now. I'll note that it also happens on
> >>>>> device_add xio3130-downstream so not specific to this new device.
> >>>>>
> >>>>> So far all I've managed to do is track it to something rcu related
> >>>>> as failing in a call to drain_call_rcu() in qmp_device_add()
> >>>>>
> >>>>> Will continue digging.  
> >>>>
> >>>> Assuming I'm seeing the same thing...
> >>>>
> >>>> Problem is g_free() on the PCIBridge windows:
> >>>> https://elixir.bootlin.com/qemu/latest/source/hw/pci/pci_bridge.c#L235
> >>>>
> >>>> Is called before we get an rcu_call() to flatview_destroy() as a
> >>>> result of the final call of flatview_unref() in 
> >>>> address_space_set_flatview()
> >>>> so we get a use after free.
> >>>>
> >>>> As to what the fix is...  Suggestions welcome!  
> >>>
> >>> It sounds like this is the wrong place to free the value then. I guess
> >>> the PCI aliases into &w->alias_io() don't get dealt with until RCU
> >>> clean-up time.
> >>>
> >>> I *think* using g_free_rcu() should be enough to ensure the free occurs
> >>> after the rest of the RCU cleanups but maybe you should only be cleaning
> >>> up the windows at device unrealize time? Is this a dynamic piece of
> >>> memory which gets updated during the lifetime of the device?  
> >>
> >> There is unfortunately code that swaps it for an updated structure
> >> in pci_bridge_update_mappings()
> >>  
> >>>
> >>> If the memory is being cleared with RCU then the access to the base
> >>> pointer should be done with the appropriate qatomic_rcu_[set|read]
> >>> functions.
> >>>  
> >>
> >> I'm annoyingly snowed under this week with other things, but hopefully
> >> can get to in a few days.  Note we are looking at an old problem
> >> here, just one that's happening for an additional device, not sure
> >> if that really affects urgency of fix though.  
> > 
> > It's too late now for QEMU 7.2 anyway, so there is no hurry, I think.  
> 
> I'm still seeing problems with this device, I guess it's still the
> same issue:
> 
> $ valgrind -q ./qemu-system-x86_64 -M x-remote -monitor stdio
> ...
> QEMU 7.2.91 monitor - type 'help' for more information
> (qemu) device_add cxl-downstream,id=c1
> ==46154== Thread 2:
> ==46154== Invalid read of size 8
> ==46154==    at 0x7A0A0B: memory_region_unref (memory.c:1798)
> ==46154==    by 0x7A0A0B: flatview_destroy (memory.c:298)
> ==46154==    by 0x9A5E32: call_rcu_thread (rcu.c:284)
> ==46154==    by 0x99CC19: qemu_thread_start (qemu-thread-posix.c:541)
> ==46154==    by 0xB6A31C9: start_thread (in /usr/lib64/libpthread-2.28.so)
> ==46154==    by 0xB8F4E72: clone (in /usr/lib64/libc-2.28.so)
> ==46154==  Address 0x1a2a95c0 is 64 bytes inside a block of size 1,632 free'd
> ==46154==    at 0x4C39A93: free (vg_replace_malloc.c:872)
> ==46154==    by 0x79B62B1: g_free (in /usr/lib64/libglib-2.0.so.0.5600.4)
> ==46154==    by 0x55E06F: cxl_dsp_realize (cxl_downstream.c:201)
> ==46154==    by 0x5523AC: pci_qdev_realize (pci.c:2098)
> ==46154==    by 0x833A1E: device_set_realized (qdev.c:510)
> ==46154==    by 0x836DC5: property_set_bool (object.c:2285)
> ==46154==    by 0x838FA2: object_property_set (object.c:1420)
> ==46154==    by 0x83C01E: object_property_set_qobject (qom-qobject.c:28)
> ==46154==    by 0x839213: object_property_set_bool (object.c:1489)
> ==46154==    by 0x5F9787: qdev_device_add_from_qdict (qdev-monitor.c:714)
> ==46154==    by 0x5F98F0: qdev_device_add (qdev-monitor.c:733)
> ==46154==    by 0x5F99E1: qmp_device_add (qdev-monitor.c:855)
> ==46154==  Block was alloc'd at
> ==46154==    at 0x4C37135: malloc (vg_replace_malloc.c:381)
> ==46154==    by 0x79B61A5: g_malloc (in /usr/lib64/libglib-2.0.so.0.5600.4)
> ==46154==    by 0x553072: pci_bridge_region_init (pci_bridge.c:191)
> ==46154==    by 0x553575: pci_bridge_initfn (pci_bridge.c:388)
> ==46154==    by 0x55E032: cxl_dsp_realize (cxl_downstream.c:147)
> ==46154==    by 0x5523AC: pci_qdev_realize (pci.c:2098)
> ==46154==    by 0x833A1E: device_set_realized (qdev.c:510)
> ==46154==    by 0x836DC5: property_set_bool (object.c:2285)
> ==46154==    by 0x838FA2: object_property_set (object.c:1420)
> ==46154==    by 0x83C01E: object_property_set_qobject (qom-qobject.c:28)
> ==46154==    by 0x839213: object_property_set_bool (object.c:1489)
> ==46154==    by 0x5F9787: qdev_device_add_from_qdict (qdev-monitor.c:714)
> 
> Could we get a fix for QEMU 8.0 ?

The original option of moving over to g_free_rcu() and need to then protect
all accesses to br->windows was make a mess of things and as far as I can
immediately spot seems to be unnecessary.

Unfortunately I don't understand why the window handling is complex in the
first place.  The following patch just embeds the structure
directly in the PCI Bridge and seems to avoid the issue you have reported.

As the code to update it on the fly is protected anyway by
memory_region_transaction_begin() I don't think we care about ensuring the
exposed windows are consistent whilst we update them.

If someone else could sanity check my logic that would be great.

Thanks,

Jonathan


diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index dd5af508f9..698fd01ae6 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -184,11 +184,11 @@ static void pci_bridge_init_vga_aliases(PCIBridge *br, 
PCIBus *parent,
     }
 }
 
-static PCIBridgeWindows *pci_bridge_region_init(PCIBridge *br)
+static void pci_bridge_region_init(PCIBridge *br)
 {
     PCIDevice *pd = PCI_DEVICE(br);
     PCIBus *parent = pci_get_bus(pd);
-    PCIBridgeWindows *w = g_new(PCIBridgeWindows, 1);
+    PCIBridgeWindows *w = &br->windows;
     uint16_t cmd = pci_get_word(pd->config + PCI_COMMAND);
 
     pci_bridge_init_alias(br, &w->alias_pref_mem,
@@ -211,8 +211,6 @@ static PCIBridgeWindows *pci_bridge_region_init(PCIBridge 
*br)
                           cmd & PCI_COMMAND_IO);
 
     pci_bridge_init_vga_aliases(br, parent, w->alias_vga);
-
-    return w;
 }
 
 static void pci_bridge_region_del(PCIBridge *br, PCIBridgeWindows *w)
@@ -234,19 +232,17 @@ static void pci_bridge_region_cleanup(PCIBridge *br, 
PCIBridgeWindows *w)
     object_unparent(OBJECT(&w->alias_vga[QEMU_PCI_VGA_IO_LO]));
     object_unparent(OBJECT(&w->alias_vga[QEMU_PCI_VGA_IO_HI]));
     object_unparent(OBJECT(&w->alias_vga[QEMU_PCI_VGA_MEM]));
-    g_free(w);
 }
 
 void pci_bridge_update_mappings(PCIBridge *br)
 {
-    PCIBridgeWindows *w = br->windows;
-
+    PCIBridgeWindows *w = &br->windows;
     /* Make updates atomic to: handle the case of one VCPU updating the bridge
      * while another accesses an unaffected region. */
     memory_region_transaction_begin();
-    pci_bridge_region_del(br, br->windows);
+    pci_bridge_region_del(br, w);
     pci_bridge_region_cleanup(br, w);
-    br->windows = pci_bridge_region_init(br);
+    pci_bridge_region_init(br);
     memory_region_transaction_commit();
 }
 
@@ -385,7 +381,7 @@ void pci_bridge_initfn(PCIDevice *dev, const char *typename)
     sec_bus->address_space_io = &br->address_space_io;
     memory_region_init(&br->address_space_io, OBJECT(br), "pci_bridge_io",
                        4 * GiB);
-    br->windows = pci_bridge_region_init(br);
+    pci_bridge_region_init(br);
     QLIST_INIT(&sec_bus->child);
     QLIST_INSERT_HEAD(&parent->child, sec_bus, sibling);
 }
@@ -396,8 +392,8 @@ void pci_bridge_exitfn(PCIDevice *pci_dev)
     PCIBridge *s = PCI_BRIDGE(pci_dev);
     assert(QLIST_EMPTY(&s->sec_bus.child));
     QLIST_REMOVE(&s->sec_bus, sibling);
-    pci_bridge_region_del(s, s->windows);
-    pci_bridge_region_cleanup(s, s->windows);
+    pci_bridge_region_del(s, &s->windows);
+    pci_bridge_region_cleanup(s, &s->windows);
     /* object_unparent() is called automatically during device deletion */
 }
 
diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h
index 81a058bb2c..b650748a39 100644
--- a/include/hw/pci/pci_bridge.h
+++ b/include/hw/pci/pci_bridge.h
@@ -30,6 +30,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/cxl/cxl.h"
 #include "qom/object.h"
+#include "qemu/rcu.h"
 
 typedef struct PCIBridgeWindows PCIBridgeWindows;
 
@@ -39,8 +40,11 @@ typedef struct PCIBridgeWindows PCIBridgeWindows;
  * as subregions.
  */
 struct PCIBridgeWindows {
+//    struct rcu_head rcu;
     MemoryRegion alias_pref_mem;
+    //   uint8_t pad;
     MemoryRegion alias_mem;
+    // uint8_t pad2;
     MemoryRegion alias_io;
     /*
      * When bridge control VGA forwarding is enabled, bridges will
@@ -73,7 +77,7 @@ struct PCIBridge {
     MemoryRegion address_space_mem;
     MemoryRegion address_space_io;
 
-    PCIBridgeWindows *windows;
+    PCIBridgeWindows windows;
 
     pci_map_irq_fn map_irq;
     const char *bus_name;

> 
>   Thomas
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]