qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 4/4] vl: Prioritize realizations of devices


From: Eduardo Habkost
Subject: Re: [PATCH 4/4] vl: Prioritize realizations of devices
Date: Mon, 23 Aug 2021 17:07:03 -0400

On Mon, Aug 23, 2021 at 03:18:51PM -0400, Peter Xu wrote:
> On Mon, Aug 23, 2021 at 02:49:12PM -0400, Eduardo Habkost wrote:
> > On Wed, Aug 18, 2021 at 03:43:18PM -0400, Peter Xu wrote:
> > > QEMU creates -device objects in order as specified by the user's cmdline.
> > > However that ordering may not be the ideal order.  For example, some 
> > > platform
> > > devices (vIOMMUs) may want to be created earlier than most of the rest
> > > devices (e.g., vfio-pci, virtio).
> > > 
> > > This patch orders the QemuOptsList of '-device's so they'll be sorted 
> > > first
> > > before kicking off the device realizations.  This will allow the device
> > > realization code to be able to use APIs like 
> > > pci_device_iommu_address_space()
> > > correctly, because those functions rely on the platfrom devices being 
> > > realized.
> > > 
> > > Now we rely on vmsd->priority which is defined as MigrationPriority to 
> > > provide
> > > the ordering, as either VM init and migration completes will need such an
> > > ordering.  In the future we can move that priority information out of 
> > > vmsd.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > 
> > Can we be 100% sure that changing the ordering of every single
> > device being created won't affect guest ABI?  (I don't think we can)
> 
> That's a good question, however I doubt whether there's any real-world guest
> ABI for that.  As a developer, I normally specify cmdline parameter in an 
> adhoc
> way, so that I assume most parameters are not sensitive to ordering and I can
> tune the ordering as wish.  I'm not sure whether that's common for qemu users,
> I would expect so, but I may have missed something that I'm not aware of.

To give just one example:

$ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device virtio-net-pci 
-device e1000e -monitor stdio | tail -n 20
  Bus  0, device   4, function 0:
    Ethernet controller: PCI device 1af4:1000
      PCI subsystem 1af4:0001
      IRQ 0, pin A
      BAR0: I/O at 0xffffffffffffffff [0x001e].
      BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe].
      BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
      id ""
  Bus  0, device   5, function 0:
    Ethernet controller: PCI device 8086:10d3
      PCI subsystem 8086:0000
      IRQ 0, pin A
      BAR0: 32 bit memory at 0xffffffffffffffff [0x0001fffe].
      BAR1: 32 bit memory at 0xffffffffffffffff [0x0001fffe].
      BAR2: I/O at 0xffffffffffffffff [0x001e].
      BAR3: 32 bit memory at 0xffffffffffffffff [0x00003ffe].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
      id ""
(qemu) quit
$ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device e1000e -device 
virtio-net-pci -monitor stdio | tail -n 20
  Bus  0, device   4, function 0:
    Ethernet controller: PCI device 8086:10d3
      PCI subsystem 8086:0000
      IRQ 0, pin A
      BAR0: 32 bit memory at 0xffffffffffffffff [0x0001fffe].
      BAR1: 32 bit memory at 0xffffffffffffffff [0x0001fffe].
      BAR2: I/O at 0xffffffffffffffff [0x001e].
      BAR3: 32 bit memory at 0xffffffffffffffff [0x00003ffe].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
      id ""
  Bus  0, device   5, function 0:
    Ethernet controller: PCI device 1af4:1000
      PCI subsystem 1af4:0001
      IRQ 0, pin A
      BAR0: I/O at 0xffffffffffffffff [0x001e].
      BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe].
      BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
      id ""
(qemu) quit


If the order of the -device arguments changes, the devices are assigned to
different PCI slots.


> 
> Per my knowledge the only "guest ABI" change is e.g. when we specify 
> "vfio-pci"
> to be before "intel-iommu": it'll be constantly broken before this patchset,
> while after this series it'll be working.  It's just that I don't think those
> "guest ABI" is necessary to be kept, and that's exactly what I want to fix 
> with
> the patchset..

If the only ordering changes caused by this patch were intentional and affected
only configurations that are known to be broken (like vfio-pci vs intel-iommu),
I would agree.

However, if we are reordering every single -device option in an unspecified way
(like qsort() does when elements compare as equal), we are probably breaking
guest ABI and creating a completely different machine (like in the PCI example 
above).


> 
> > 
> > How many device types in QEMU have non-default vmsd priority?
> 
> Not so much; here's the list of priorities and the devices using it:
> 
>        |--------------------+---------|
>        | priority           | devices |
>        |--------------------+---------|
>        | MIG_PRI_IOMMU      |       3 |
>        | MIG_PRI_PCI_BUS    |       7 |
>        | MIG_PRI_VIRTIO_MEM |       1 |
>        | MIG_PRI_GICV3_ITS  |       1 |
>        | MIG_PRI_GICV3      |       1 |
>        |--------------------+---------|
> 
> All the rest devices are using the default (0) priority.
> 
> > 
> > Can we at least ensure devices with the same priority won't be
> > reordered, just to be safe?  (qsort() doesn't guarantee that)
> > 
> > If very few device types have non-default vmsd priority and
> > devices with the same priority aren't reordered, the risk of
> > compatibility breakage would be much smaller.
> 
> I'm also wondering whether it's a good thing to break some guest ABI due to
> this change, if possible.
> 
> Let's imagine something breaks after applied, then the only reason should be
> that qsort() changed the order of some same-priority devices and it's not the
> same as user specified any more.  Then, does it also means there's yet another
> ordering requirement that we didn't even notice?

Does the PCI slot assignment example above demonstrate a ordering requirement?
I don't think it does.  It just demonstrates that different orderings are
expected to give different results, but both orderings are valid.


> 
> I doubt whether that'll even happen (or I think there'll be report already, as
> in qemu man page there's no requirement on parameter ordering).  In all cases,
> instead of "keeping the same priority devices in the same order as the user 
> has
> specified", IMHO we should make the broken devices to have different 
> priorities
> so the ordering will be guaranteed by qemu internal, rather than how user
> specified it.
> 
> From that pov, maybe this patchset would be great if it can be accepted and
> applied in early stage of a release? So we can figure out what's missing and
> fix them within the same release.  However again I still doubt whether there's
> any user that will break in a bad way.
> 
> Thanks,
> 
> -- 
> Peter Xu
> 

-- 
Eduardo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]