[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 4/4] vl: Prioritize realizations of devices
From: |
Eduardo Habkost |
Subject: |
Re: [PATCH 4/4] vl: Prioritize realizations of devices |
Date: |
Mon, 23 Aug 2021 17:07:03 -0400 |
On Mon, Aug 23, 2021 at 03:18:51PM -0400, Peter Xu wrote:
> On Mon, Aug 23, 2021 at 02:49:12PM -0400, Eduardo Habkost wrote:
> > On Wed, Aug 18, 2021 at 03:43:18PM -0400, Peter Xu wrote:
> > > QEMU creates -device objects in order as specified by the user's cmdline.
> > > However that ordering may not be the ideal order. For example, some
> > > platform
> > > devices (vIOMMUs) may want to be created earlier than most of the rest
> > > devices (e.g., vfio-pci, virtio).
> > >
> > > This patch orders the QemuOptsList of '-device's so they'll be sorted
> > > first
> > > before kicking off the device realizations. This will allow the device
> > > realization code to be able to use APIs like
> > > pci_device_iommu_address_space()
> > > correctly, because those functions rely on the platfrom devices being
> > > realized.
> > >
> > > Now we rely on vmsd->priority which is defined as MigrationPriority to
> > > provide
> > > the ordering, as either VM init and migration completes will need such an
> > > ordering. In the future we can move that priority information out of
> > > vmsd.
> > >
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> >
> > Can we be 100% sure that changing the ordering of every single
> > device being created won't affect guest ABI? (I don't think we can)
>
> That's a good question, however I doubt whether there's any real-world guest
> ABI for that. As a developer, I normally specify cmdline parameter in an
> adhoc
> way, so that I assume most parameters are not sensitive to ordering and I can
> tune the ordering as wish. I'm not sure whether that's common for qemu users,
> I would expect so, but I may have missed something that I'm not aware of.
To give just one example:
$ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device virtio-net-pci
-device e1000e -monitor stdio | tail -n 20
Bus 0, device 4, function 0:
Ethernet controller: PCI device 1af4:1000
PCI subsystem 1af4:0001
IRQ 0, pin A
BAR0: I/O at 0xffffffffffffffff [0x001e].
BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe].
BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe].
BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
id ""
Bus 0, device 5, function 0:
Ethernet controller: PCI device 8086:10d3
PCI subsystem 8086:0000
IRQ 0, pin A
BAR0: 32 bit memory at 0xffffffffffffffff [0x0001fffe].
BAR1: 32 bit memory at 0xffffffffffffffff [0x0001fffe].
BAR2: I/O at 0xffffffffffffffff [0x001e].
BAR3: 32 bit memory at 0xffffffffffffffff [0x00003ffe].
BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
id ""
(qemu) quit
$ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device e1000e -device
virtio-net-pci -monitor stdio | tail -n 20
Bus 0, device 4, function 0:
Ethernet controller: PCI device 8086:10d3
PCI subsystem 8086:0000
IRQ 0, pin A
BAR0: 32 bit memory at 0xffffffffffffffff [0x0001fffe].
BAR1: 32 bit memory at 0xffffffffffffffff [0x0001fffe].
BAR2: I/O at 0xffffffffffffffff [0x001e].
BAR3: 32 bit memory at 0xffffffffffffffff [0x00003ffe].
BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
id ""
Bus 0, device 5, function 0:
Ethernet controller: PCI device 1af4:1000
PCI subsystem 1af4:0001
IRQ 0, pin A
BAR0: I/O at 0xffffffffffffffff [0x001e].
BAR1: 32 bit memory at 0xffffffffffffffff [0x00000ffe].
BAR4: 64 bit prefetchable memory at 0xffffffffffffffff [0x00003ffe].
BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
id ""
(qemu) quit
If the order of the -device arguments changes, the devices are assigned to
different PCI slots.
>
> Per my knowledge the only "guest ABI" change is e.g. when we specify
> "vfio-pci"
> to be before "intel-iommu": it'll be constantly broken before this patchset,
> while after this series it'll be working. It's just that I don't think those
> "guest ABI" is necessary to be kept, and that's exactly what I want to fix
> with
> the patchset..
If the only ordering changes caused by this patch were intentional and affected
only configurations that are known to be broken (like vfio-pci vs intel-iommu),
I would agree.
However, if we are reordering every single -device option in an unspecified way
(like qsort() does when elements compare as equal), we are probably breaking
guest ABI and creating a completely different machine (like in the PCI example
above).
>
> >
> > How many device types in QEMU have non-default vmsd priority?
>
> Not so much; here's the list of priorities and the devices using it:
>
> |--------------------+---------|
> | priority | devices |
> |--------------------+---------|
> | MIG_PRI_IOMMU | 3 |
> | MIG_PRI_PCI_BUS | 7 |
> | MIG_PRI_VIRTIO_MEM | 1 |
> | MIG_PRI_GICV3_ITS | 1 |
> | MIG_PRI_GICV3 | 1 |
> |--------------------+---------|
>
> All the rest devices are using the default (0) priority.
>
> >
> > Can we at least ensure devices with the same priority won't be
> > reordered, just to be safe? (qsort() doesn't guarantee that)
> >
> > If very few device types have non-default vmsd priority and
> > devices with the same priority aren't reordered, the risk of
> > compatibility breakage would be much smaller.
>
> I'm also wondering whether it's a good thing to break some guest ABI due to
> this change, if possible.
>
> Let's imagine something breaks after applied, then the only reason should be
> that qsort() changed the order of some same-priority devices and it's not the
> same as user specified any more. Then, does it also means there's yet another
> ordering requirement that we didn't even notice?
Does the PCI slot assignment example above demonstrate a ordering requirement?
I don't think it does. It just demonstrates that different orderings are
expected to give different results, but both orderings are valid.
>
> I doubt whether that'll even happen (or I think there'll be report already, as
> in qemu man page there's no requirement on parameter ordering). In all cases,
> instead of "keeping the same priority devices in the same order as the user
> has
> specified", IMHO we should make the broken devices to have different
> priorities
> so the ordering will be guaranteed by qemu internal, rather than how user
> specified it.
>
> From that pov, maybe this patchset would be great if it can be accepted and
> applied in early stage of a release? So we can figure out what's missing and
> fix them within the same release. However again I still doubt whether there's
> any user that will break in a bad way.
>
> Thanks,
>
> --
> Peter Xu
>
--
Eduardo
- [PATCH 0/4] vl: Prioritize device realizations, Peter Xu, 2021/08/18
- [PATCH 1/4] qdev-monitor: Trace qdev creation, Peter Xu, 2021/08/18
- [PATCH 2/4] qemu-config: Allow in-place sorting of QemuOptsList, Peter Xu, 2021/08/18
- [PATCH 3/4] qdev: Export qdev_get_device_class(), Peter Xu, 2021/08/18
- [PATCH 4/4] vl: Prioritize realizations of devices, Peter Xu, 2021/08/18
- Re: [PATCH 4/4] vl: Prioritize realizations of devices, Eduardo Habkost, 2021/08/23
- Re: [PATCH 4/4] vl: Prioritize realizations of devices, Eduardo Habkost, 2021/08/23
- Re: [PATCH 4/4] vl: Prioritize realizations of devices, Peter Xu, 2021/08/23
- Re: [PATCH 4/4] vl: Prioritize realizations of devices, Markus Armbruster, 2021/08/25
- Re: [PATCH 4/4] vl: Prioritize realizations of devices, Markus Armbruster, 2021/08/25
- Re: [PATCH 4/4] vl: Prioritize realizations of devices, Peter Xu, 2021/08/25
- Re: [PATCH 4/4] vl: Prioritize realizations of devices, Peter Xu, 2021/08/25
- Re: [PATCH 4/4] vl: Prioritize realizations of devices, Markus Armbruster, 2021/08/26