[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about V
From: |
David Gibson |
Subject: |
Re: [Qemu-ppc] [PATCH qemu v2 2/2] spapr_pci: Remove constraints about VFIO-PCI devices |
Date: |
Mon, 14 Sep 2015 14:04:07 +1000 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Fri, Sep 11, 2015 at 02:03:38PM -0600, Alex Williamson wrote:
> On Wed, 2015-09-09 at 20:43 -0600, Alex Williamson wrote:
> > On Thu, 2015-09-03 at 14:40 +1000, Alexey Kardashevskiy wrote:
> > > So far there were 2 limitations enforced on an emulated PHB
> > > regarding VFIO:
> > > 1) only one IOMMU group per IOMMU container was allowed and
> > > the spapr-pci-vfio-host-bridge device has an IOMMU ID property for this;
> > > 2) only one IOMMU container per PHB was allowed as a group
> > > can only be attached to one container.
> > >
> > > However these are not really necessary so we are removing them here.
> > >
> > > This deprecates IOMMU group ID and changes vfio_container_do_ioctl()
> > > not to receive it. Instead of getting a container from a group ID,
> > > this calls ioctl() for every container attached to the PHB address space.
> > > This allows multiple containers on the same PHB which means multiple
> > > groups per PHB. Note that this will lead to every H_PUT_TCE&etc call
> > > to be passed to every container separately which will affect
> > > the performance. For 32bit devices it is still recommended to put
> > > every group to a separate PHB.
> > >
> > > Since the existing VFIO code is already trying to share a container for
> > > multiple groups, just removing a group id from
> > > the vfio_container_do_ioctl() allows the existing code to share
> > > a container if it is supported by the host kernel.
> > >
> > > This replaces the check for a group id to be set correctly with
> > > the check that it is not set.
> > >
> > > This removes casts to sPAPRPHBVFIOState as none of sPAPRPHBVFIOState
> > > members is accessed here.
> > >
> > > Signed-off-by: Alexey Kardashevskiy <address@hidden>
> > > ---
> > > hw/ppc/spapr_pci.c | 10 +++++-----
> > > hw/ppc/spapr_pci_vfio.c | 17 ++++++-----------
> > > hw/vfio/common.c | 22 ++++++----------------
> > > include/hw/vfio/vfio.h | 3 +--
> > > 4 files changed, 18 insertions(+), 34 deletions(-)
> > >
> > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > > index 4e289cb..1b7559d 100644
> > > --- a/hw/ppc/spapr_pci.c
> > > +++ b/hw/ppc/spapr_pci.c
> > > @@ -810,11 +810,6 @@ static int
> > > spapr_phb_dma_capabilities_update(sPAPRPHBState *sphb)
> > > pci_for_each_device(bus, pci_bus_num(bus), spapr_phb_walk_devices,
> > > sphb);
> > >
> > > if (sphb->vfio_num) {
> > > - if (sphb->iommugroupid == -1) {
> > > - error_report("Wrong IOMMU group ID %d", sphb->iommugroupid);
> > > - return -1;
> > > - }
> > > -
> > > ret = spapr_phb_vfio_dma_capabilities_update(sphb);
> > > if (ret) {
> > > error_report("Unable to get DMA32 info from VFIO");
> > > @@ -1282,6 +1277,11 @@ static void spapr_phb_realize(DeviceState *dev,
> > > Error **errp)
> > > PCIBus *bus;
> > > uint64_t msi_window_size = 4096;
> > >
> > > + if ((sphb->iommugroupid != -1) &&
> > > + object_dynamic_cast(OBJECT(sphb),
> > > TYPE_SPAPR_PCI_VFIO_HOST_BRIDGE)) {
> > > + error_report("Warning: iommugroupid is deprecated and will be
> > > ignored");
> > > + }
> > > +
> > > if (sphb->index != (uint32_t)-1) {
> > > hwaddr windows_base;
> > >
> > > diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> > > index f94d8a4..48137d5 100644
> > > --- a/hw/ppc/spapr_pci_vfio.c
> > > +++ b/hw/ppc/spapr_pci_vfio.c
> > > @@ -35,7 +35,7 @@ int
> > > spapr_phb_vfio_dma_capabilities_update(sPAPRPHBState *sphb)
> > > struct vfio_iommu_spapr_tce_info info = { .argsz = sizeof(info) };
> > > int ret;
> > >
> > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > + ret = vfio_container_ioctl(&sphb->iommu_as,
> > > VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
> > > if (ret) {
> > > return ret;
> > > @@ -54,8 +54,7 @@ void spapr_phb_vfio_eeh_reenable(sPAPRPHBState *sphb)
> > > .op = VFIO_EEH_PE_ENABLE
> > > };
> > >
> > > - vfio_container_ioctl(&sphb->iommu_as,
> > > - sphb->iommugroupid, VFIO_EEH_PE_OP, &op);
> > > + vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > > }
> > >
> > > int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> > > @@ -81,8 +80,7 @@ int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
> > > return RTAS_OUT_PARAM_ERROR;
> > > }
> > >
> > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > - VFIO_EEH_PE_OP, &op);
> > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > > if (ret < 0) {
> > > return RTAS_OUT_HW_ERROR;
> > > }
> > > @@ -96,8 +94,7 @@ int spapr_phb_vfio_eeh_get_state(sPAPRPHBState *sphb,
> > > int *state)
> > > int ret;
> > >
> > > op.op = VFIO_EEH_PE_GET_STATE;
> > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > - VFIO_EEH_PE_OP, &op);
> > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > > if (ret < 0) {
> > > return RTAS_OUT_PARAM_ERROR;
> > > }
> > > @@ -170,8 +167,7 @@ int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int
> > > option)
> > > return RTAS_OUT_PARAM_ERROR;
> > > }
> > >
> > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > - VFIO_EEH_PE_OP, &op);
> > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > > if (ret < 0) {
> > > return RTAS_OUT_HW_ERROR;
> > > }
> > > @@ -185,8 +181,7 @@ int spapr_phb_vfio_eeh_configure(sPAPRPHBState *sphb)
> > > int ret;
> > >
> > > op.op = VFIO_EEH_PE_CONFIGURE;
> > > - ret = vfio_container_ioctl(&sphb->iommu_as, sphb->iommugroupid,
> > > - VFIO_EEH_PE_OP, &op);
> > > + ret = vfio_container_ioctl(&sphb->iommu_as, VFIO_EEH_PE_OP, &op);
> > > if (ret < 0) {
> > > return RTAS_OUT_PARAM_ERROR;
> > > }
> > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > index 85ee9b0..a00644e 100644
> > > --- a/hw/vfio/common.c
> > > +++ b/hw/vfio/common.c
> > > @@ -926,35 +926,25 @@ void vfio_put_base_device(VFIODevice *vbasedev)
> > > close(vbasedev->fd);
> > > }
> > >
> > > -static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
> > > - int req, void *param)
> > > +static int vfio_container_do_ioctl(AddressSpace *as, int req, void
> > > *param)
> > > {
> > > - VFIOGroup *group;
> > > VFIOContainer *container;
> > > int ret = -1;
> > > + VFIOAddressSpace *space = vfio_get_address_space(as);
> > >
> > > - group = vfio_get_group(groupid, as);
> > > - if (!group) {
> > > - error_report("vfio: group %d not registered", groupid);
> > > - return ret;
> > > - }
> > > -
> > > - container = group->container;
> > > - if (group->container) {
> > > + QLIST_FOREACH(container, &space->containers, next) {
> > > ret = ioctl(container->fd, req, param);
> > > if (ret < 0) {
> > > error_report("vfio: failed to ioctl %d to container: ret=%d,
> > > %s",
> > > _IOC_NR(req) - VFIO_BASE, ret, strerror(errno));
> > > + return -errno;
> > > }
> > > }
> >
> > This highlights how terrible this ioctl passthrough interface is; what's
> > a caller supposed to do on error? Here we don't have visibility into
> > the ioctl being called in order to do any unwind on error. The caller
> > doesn't have the context to unwind only the failed containers. Is
> > returning errno here really sufficient for anyone to figure out how to
> > proceed or should we just cut our loses and abort()? What's the least
> > horrible way we can realistically handle a failure here? Thanks,
>
> It seemed like I asked this before and it turns out that I did:
>
> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07399.html
>
> "Gavin says yes" is not really a supportable solution when I stumbled on
> it again and David doesn't know why it's safe either (nor does it answer
> my question of how does this work). At a minimum, the reasoning why
> this is safe for the ioctls we currently allow here needs to be spelled
> out in a comment. We could easily make the mistake of adding another
> ioctl to the passthrough for which those assumptions are not valid. We
> may even want to abort if it's not one of the ioctls we've evaluated so
> we don't overlook this problem later. Thanks,
Yeah, you're right. This is a mess.
What we need to do here is to make vfio_container_ioctl() take a
VFIOContainer object as the name suggests. The the callers will need
to either use it on a specific container, or iterate across all the
containers in the VFIOAddressSpace as appropriate for the specific
operation.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
pgpkKl72e1QY3.pgp
Description: PGP signature
[Qemu-ppc] [PATCH qemu v2 1/2] spapr_pci_vfio: Remove redundant spapr-pci-vfio-host-bridge, Alexey Kardashevskiy, 2015/09/03