[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 4/4] intel_iommu: Reset vIOMMU at the last stage of system re
From: |
Eric Auger |
Subject: |
Re: [PATCH 4/4] intel_iommu: Reset vIOMMU at the last stage of system reset |
Date: |
Wed, 17 Jan 2024 11:38:55 +0100 |
User-agent: |
Mozilla Thunderbird |
Hi Peter,
On 1/17/24 10:15, peterx@redhat.com wrote:
> From: Peter Xu <peterx@redhat.com>
>
> We got report from Yanghang Liu on an unexpected host DMA error when system
> resets with VFIO attached to vIOMMU in the VM context. Alex Williamson
> quickly spot that there can be ordering issues on resets. A further test
> verified that the issue is indeed caused by such wrong ordering.
nit: not sure the commit msg should contain people info (cover letter
does it already + credits below)
>
> vIOMMU is a fundamentally infrustructural device, its reset is currently
infrastructural
> problematic because no ordering is guaranteed against other PCI devices
> which may DMA through the vIOMMU device.
>
> The reset order is tricky, not only because it's current representation as
s/it's/its
> a normal "-device" (so it kind of follow the qdev tree depth-first reset,
> but at a wrong place in the qtree; ideally it should be the parent
> somewhere for all pci buses, or just part of pci host bridge), but also
> because customized device reset hooks registered over the system reset
> framework, so that the ordering of the vIOMMU reset is not guaranteed.
>
> For example, VFIO can register its reset hook with vfio_reset_handler() if
> some device does not support FLR. That will not so far follow the
> depth-first travelsal reset mechanism provided by QEMU reset framework.
traversal
>
> To remedy both of the issues with limited code changes, leverage the newly
> introduced reset stage framework to reset vIOMMUs at the last stage of the
> rest devices. More information can be found in the comments in the patch,
> which I decided to persist even with the code to make the problem even
> clearer (with potential TODOs for the future, if possible).
>
> Buglink: https://issues.redhat.com/browse/RHEL-7188
> Analyzed-by: Alex Williamson <alex.williamson@redhat.com>
> Reported-by: YangHang Liu <yanghliu@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> hw/i386/intel_iommu.c | 54 +++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 8b467cbbd2..5a8fbcad7a 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -35,6 +35,7 @@
> #include "sysemu/kvm.h"
> #include "sysemu/dma.h"
> #include "sysemu/sysemu.h"
> +#include "sysemu/reset.h"
> #include "hw/i386/apic_internal.h"
> #include "kvm/kvm_i386.h"
> #include "migration/vmstate.h"
> @@ -4086,7 +4087,7 @@ static void vtd_init(IntelIOMMUState *s)
> /* Should not reset address_spaces when reset because devices will still use
> * the address space they got at first (won't ask the bus again).
> */
> -static void vtd_reset(DeviceState *dev)
> +static void vtd_reset(void *dev)
> {
> IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>
> @@ -4242,7 +4243,6 @@ static void vtd_class_init(ObjectClass *klass, void
> *data)
> DeviceClass *dc = DEVICE_CLASS(klass);
> X86IOMMUClass *x86_class = X86_IOMMU_DEVICE_CLASS(klass);
>
> - dc->reset = vtd_reset;
> dc->vmsd = &vtd_vmstate;
> device_class_set_props(dc, vtd_properties);
> dc->hotpluggable = false;
> @@ -4254,10 +4254,60 @@ static void vtd_class_init(ObjectClass *klass, void
> *data)
> dc->desc = "Intel IOMMU (VT-d) DMA Remapping device";
> }
>
> +static void vtd_instance_init(Object *obj)
> +{
> + IntelIOMMUState *s = INTEL_IOMMU_DEVICE(obj);
> +
> + /*
> + * vIOMMU reset may require proper ordering with other devices. There
> + * are two complexities so that normal DeviceState.reset() may not
> + * work properly for vIOMMUs:
> + *
> + * (1) Device depth-first reset hierachy doesn't yet work for vIOMMUs
> + * (reference: resettable_cold_reset_fn())
> + *
> + * Currently, vIOMMU devices are created as normal '-device'
> + * cmdlines. It means in many ways it has the same attributes with
s/with/as
> + * most of the rest devices, even if the rest devices should
s/rest/other
> + * logically be under control of the vIOMMU unit.
> + *
> + * One side effect of it is vIOMMU devices will be currently put
> + * randomly under qdev tree hierarchy, which is the source of
> + * device reset ordering in current QEMU (depth-first traversal).
> + * It means vIOMMU now can be reset before some devices. For fully
> + * emulated devices that's not a problem, because the traversal
> + * holds BQL for the whole process. However it is a problem if DMA
> + * can happen without BQL, like VFIO, vDPA or remote device process.
> + *
> + * TODO: one ideal solution can be that we make vIOMMU the parent
> + * of the whole pci host bridge. Hence vIOMMU can be reset after
> + * all the devices are reset and quiesced.
in hw/pci/pci.c there is also the info iommu_bus? When resetting a pci
device know whether it is attached to a viommu. I don't know if we could
plug the reset priority mechanism at this level.
> + *
> + * (2) Some devices register its own reset functions
> + *
> + * Even if above issue solved, if devices register its own reset
s/its/their
> + * functions for some reason via QEMU reset hooks, vIOMMU can still
> + * be reset before the device. One example is vfio_reset_handler()
> + * where FLR is not supported on the device.
> + *
> + * TODO: merge relevant reset functions into the device tree reset
> + * framework.
> + *
> + * Neither of the above TODO may be trivial. To make it work for now,
> + * leverage reset stages and reset vIOMMU always at latter stage of the
> + * default. It means it needs to be reset after at least:
> + *
> + * - resettable_cold_reset_fn(): machine qdev tree reset
> + * - vfio_reset_handler(): VFIO reset for !FLR
> + */
> + qemu_register_reset_one(vtd_reset, s, false, 1);
introducing enum values may be better ( just as we have for migration prio)
> +}
> +
> static const TypeInfo vtd_info = {
> .name = TYPE_INTEL_IOMMU_DEVICE,
> .parent = TYPE_X86_IOMMU_DEVICE,
> .instance_size = sizeof(IntelIOMMUState),
> + .instance_init = vtd_instance_init,
> .class_init = vtd_class_init,
> };
>
Thanks
Eric
- Re: [PATCH 2/4] reset: Allow multiple stages of system resets, (continued)
[PATCH 3/4] intel_iommu: Tear down address spaces before IOMMU reset, peterx, 2024/01/17
[PATCH 4/4] intel_iommu: Reset vIOMMU at the last stage of system reset, peterx, 2024/01/17
- Re: [PATCH 4/4] intel_iommu: Reset vIOMMU at the last stage of system reset,
Eric Auger <=
Re: [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices, Eric Auger, 2024/01/17