[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [RFC PATCH v4 00/24] vfio: Adopt iommufd
From: |
Duan, Zhenzhong |
Subject: |
RE: [RFC PATCH v4 00/24] vfio: Adopt iommufd |
Date: |
Tue, 1 Aug 2023 08:28:01 +0000 |
Ping, any comments or suggestions are appreciated.
Thanks
Zhenzhong
>-----Original Message-----
>From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Sent: Wednesday, July 12, 2023 3:25 PM
>To: qemu-devel@nongnu.org
>Cc: alex.williamson@redhat.com; clg@redhat.com; jgg@nvidia.com;
>nicolinc@nvidia.com; eric.auger@redhat.com; peterx@redhat.com;
>jasonwang@redhat.com; Tian, Kevin <kevin.tian@intel.com>; Liu, Yi L
><yi.l.liu@intel.com>; Sun, Yi Y <yi.y.sun@intel.com>; Peng, Chao P
><chao.p.peng@intel.com>; Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Subject: [RFC PATCH v4 00/24] vfio: Adopt iommufd
>
>With the introduction of iommufd, the Linux kernel provides a generic
>interface for userspace drivers to propagate their DMA mappings to kernel
>for assigned devices. This series does the porting of the VFIO devices
>onto the /dev/iommu uapi and let it coexist with the legacy implementation.
>
>This QEMU integration is the result of a collaborative work between
>Yi Liu, Yi Sun, Nicolin Chen and Eric Auger.
>
>At QEMU level, interactions with the /dev/iommu are abstracted by a new
>iommufd object (compiled in with the CONFIG_IOMMUFD option).
>
>Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
>linked with an iommufd object. In this series, the vfio-pci device is
>granted with such capability (other VFIO devices are not yet ready):
>
>It gets a new optional parameter named iommufd which allows to pass
>an iommufd object:
>
> -object iommufd,id=iommufd0
> -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
>
>Note the /dev/iommu and vfio cdev can be externally opened by a
>management layer. In such a case the fd is passed:
>
> -object iommufd,id=iommufd0,fd=22
> -device vfio-pci,iommufd=iommufd0,fd=23
>
>If the fd parameter is not passed, the fd is opened by QEMU.
>See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
>for detailed discuss on this requirement.
>
>If no iommufd option is passed to the vfio-pci device, iommufd is not
>used and the end-user gets the behavior based on the legacy vfio iommu
>interfaces:
>
> -device vfio-pci,host=0000:02:00.0
>
>While the legacy kernel interface is group-centric, the new iommufd
>interface is device-centric, relying on device fd and iommufd.
>
>To support both interfaces in the QEMU VFIO device we reworked the vfio
>container abstraction so that the generic VFIO code can use either
>backend.
>
>The VFIOContainer object becomes a base object derived into
>a) the legacy VFIO container and
>b) the new iommufd based container.
>
>The base object implements generic code such as code related to
>memory_listener and address space management whereas the derived
>objects implement callbacks specific to either BE, legacy and
>iommufd. Indeed each backend has its own way to setup secure context
>and dma management interface. The below diagram shows how it looks
>like with both BEs.
>
> VFIO AddressSpace/Memory
> +-------+ +----------+ +-----+ +-----+
> | pci | | platform | | ap | | ccw |
> +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+
> | | | | | AddressSpace |
> | | | | +------------+---------+
> +---V-----------V-----------V--------V----+ /
> | VFIOAddressSpace | <------------+
> | | | MemoryListener
> | VFIOContainer list |
> +-------+----------------------------+----+
> | |
> | |
> +-------V------+ +--------V----------+
> | iommufd | | vfio legacy |
> | container | | container |
> +-------+------+ +--------+----------+
> | |
> | /dev/iommu | /dev/vfio/vfio
> | /dev/vfio/devices/vfioX | /dev/vfio/$group_id
>Userspace | |
>============+============================+=======================
>====
>Kernel | device fd |
> +---------------+ | group/container fd
> | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU)
> | ATTACH_IOAS) | | device fd
> | | |
> | +-------V------------V-----------------+
> iommufd | | vfio |
>(map/unmap | +---------+--------------------+-------+
>ioas_copy) | | | map/unmap
> | | |
> +------V------+ +-----V------+ +------V--------+
> | iommfd core | | device | | vfio iommu |
> +-------------+ +------------+ +---------------+
>
>[Secure Context setup]
>- iommufd BE: uses device fd and iommufd to setup secure context
> (bind_iommufd, attach_ioas)
>- vfio legacy BE: uses group fd and container fd to setup secure context
> (set_container, set_iommu)
>[Device access]
>- iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
>- vfio legacy BE: device fd is retrieved from group fd ioctl
>[DMA Mapping flow]
>1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
>2. VFIO populates DMA map/unmap via the container BEs
> *) iommufd BE: uses iommufd
> *) vfio legacy BE: uses container fd
>
>This series depends on Yi's kernel series:
>"[PATCH v14 00/26] Add vfio_device cdev for iommufd support"
>https://lore.kernel.org/all/20230711025928.6438-1-yi.l.liu@intel.com/
>and
>"[PATCH v9 00/10] Enhance vfio PCI hot reset for vfio cdev device"
>https://lore.kernel.org/kvm/20230711023126.5531-1-yi.l.liu@intel.com/
>
>which can be found at:
>https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v14
>
>This qemu series can be found at:
>https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_rfcv4
>
>Test done:
>- PCI device were tested
>- platform, ccw and ap were only compile-tested
>- FD passing and hot reset with some trick.
>- device hotplug test with legacy and iommufd backends (limited tests)
>- vIOMMU test run for both legacy and iommufd backends (limited tests)
>
>
>Given some iommufd kernel limitations, the iommufd backend is
>not yet fully on par with the legacy backend w.r.t. features like:
>- p2p mappings (you will see related error traces)
>- live migration
>- and etc.
>
>About TODOs in rfcv3:
>- Add DMA alias check for iommufd BE (group level)
>attach_ioas will fail for aliased device, so I think that's not a problem.
>
>- Make pci.c to be BE agnostic. Needs kernel change as well to fix the
> VFIO_DEVICE_PCI_HOT_RESET gap
>I didn't make pci.c fully group agnostic because pci device reset is
>device scope operation, force mapping it to container scope callback
>isn't a good idea. Instead I added iommufd code in pci.c and fixed
>VFIO_DEVICE_PCI_HOT_RESET gap there.
>
>- Cleanup the VFIODevice fields as it's used in both backends
>- Replace list with g_tree
>This TODO is not viable due to iterator callback depending on list element.
>
>- Add locks
>I think it's not necessory as BQL already ensure that.
>
>base-commit: 887cba855b
>
>Change log:
>v3 -> v4:
>- rebase on top of v8.0.3
>- Add one patch from Yi which is about vfio device add in kvm
>- Remove IOAS_COPY optimization and focus on functions in this patchset
>- Fix wrong name issue reported and fix suggested by Matthew
>- Fix compilation issue reported and fix sugggsted by Nicolin
>- Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
>granularity
>- Add dev_iter_next() callback to avoid adding so many callback
> at container scope, add VFIODevice.hwpt to support that
>- Restore all functions back to common from container whenever possible,
> mainly migration and reset related functions
>- Add --enable/disable-iommufd config option, enabled by default in linux
>- Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
>- Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
>- vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
>redundant code
>- Add FD passing support for vfio device backed by IOMMUFD
>- Fix hot unplug resource leak issue in vfio_legacy_detach_device()
>- Fix FD leak in vfio_get_devicefd()
>
>v3: https://lists.nongnu.org/archive/html/qemu-devel/2023-01/msg07189.html
>
>v2 -> v3:
>- rebase on top of v7.2.0
>- Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
> VFIO backends
>- Fix use after free in error path, reported by Alister
>- Split common.c in several steps to ease the review
>
>v1 -> v2:
>- remove the first three patches of rfcv1
>- add open cdev helper suggested by Jason
>- remove the QOMification of the VFIOContainer and simply use standard ops
>(David)
>- add "-object iommufd" suggested by Alex
>
>v1: https://lore.kernel.org/qemu-devel/20220414104710.28534-1-
>yi.l.liu@intel.com/
>
>Thanks,
>Yi, Yi, Eric, Zhenzhong
>
>Eric Auger (9):
> scripts/update-linux-headers: Add iommufd.h
> vfio/common: Introduce vfio_container_add|del_section_window()
> vfio/container: Introduce vfio_[attach/detach]_device
> vfio/platform: Use vfio_[attach/detach]_device
> vfio/ap: Use vfio_[attach/detach]_device
> vfio/ccw: Use vfio_[attach/detach]_device
> vfio/container-base: Introduce [attach/detach]_device container
> callbacks
> backends/iommufd: Introduce the iommufd object
> vfio/as: Allow the selection of a given iommu backend
>
>Yi Liu (6):
> vfio/common: Move IOMMU agnostic helpers to a separate file
> vfio/common: Move legacy VFIO backend code into separate container.c
> vfio/common: Rename into as.c
> vfio: Add base container
> util/char_dev: Add open_cdev()
> vfio/iommufd: Implement the iommufd backend
>
>Zhenzhong Duan (9):
> Update linux-header per VFIO device cdev v14
> vfio/common: Extract out vfio_kvm_device_[add/del]_fd
> vfio/common: Add a vfio device iterator
> vfio/common: Refactor vfio_viommu_preset() to be group agnostic
> vfio/as: Simplify vfio_viommu_preset()
> Add iommufd configure option
> vfio/as: Add vfio device iterator callback for iommufd
> vfio/pci: Adapt vfio pci hot reset support with iommufd BE
> vfio/iommufd: Make vfio cdev pre-openable by passing a file handle
>
> MAINTAINERS | 13 +
> backends/Kconfig | 4 +
> backends/iommufd.c | 268 +++
> backends/meson.build | 3 +
> backends/trace-events | 12 +
> hw/vfio/ap.c | 66 +-
> hw/vfio/as.c | 1555 +++++++++++++
> hw/vfio/ccw.c | 122 +-
> hw/vfio/common.c | 3078 -------------------------
> hw/vfio/container-base.c | 146 ++
> hw/vfio/container.c | 1218 ++++++++++
> hw/vfio/helpers.c | 598 +++++
> hw/vfio/iommufd.c | 546 +++++
> hw/vfio/meson.build | 8 +-
> hw/vfio/pci.c | 354 ++-
> hw/vfio/platform.c | 43 +-
> hw/vfio/spapr.c | 22 +-
> hw/vfio/trace-events | 16 +-
> include/hw/vfio/vfio-common.h | 109 +-
> include/hw/vfio/vfio-container-base.h | 158 ++
> include/qemu/char_dev.h | 16 +
> include/sysemu/iommufd.h | 47 +
> linux-headers/linux/iommufd.h | 347 +++
> linux-headers/linux/kvm.h | 13 +-
> linux-headers/linux/vfio.h | 142 +-
> meson.build | 6 +
> meson_options.txt | 2 +
> qapi/qom.json | 18 +-
> qemu-options.hx | 13 +
> scripts/meson-buildoptions.sh | 3 +
> scripts/update-linux-headers.sh | 3 +-
> util/chardev_open.c | 61 +
> util/meson.build | 1 +
> 33 files changed, 5601 insertions(+), 3410 deletions(-)
> create mode 100644 backends/iommufd.c
> create mode 100644 hw/vfio/as.c
> delete mode 100644 hw/vfio/common.c
> create mode 100644 hw/vfio/container-base.c
> create mode 100644 hw/vfio/container.c
> create mode 100644 hw/vfio/helpers.c
> create mode 100644 hw/vfio/iommufd.c
> create mode 100644 include/hw/vfio/vfio-container-base.h
> create mode 100644 include/qemu/char_dev.h
> create mode 100644 include/sysemu/iommufd.h
> create mode 100644 linux-headers/linux/iommufd.h
> create mode 100644 util/chardev_open.c
>
>--
>2.34.1
- RE: [RFC PATCH v4 00/24] vfio: Adopt iommufd,
Duan, Zhenzhong <=