qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH 0/3 QEMU v4] vfio on ppc64


From: Alex Williamson
Subject: Re: [Qemu-ppc] [PATCH 0/3 QEMU v4] vfio on ppc64
Date: Mon, 22 Apr 2013 19:58:12 -0600

On Tue, 2013-04-23 at 09:44 +1000, David Gibson wrote:
> On Mon, Apr 22, 2013 at 12:39:26PM -0600, Alex Williamson wrote:
> > On Mon, 2013-04-22 at 18:02 +1000, Alexey Kardashevskiy wrote:
> > > Yes, we are still tuning this stuff for us :)
> > 
> > Yay!
> > 
> > > Changes:
> > > 
> > > * new "spapr-pci-vfio-host-bridge" device is introduced
> > > (inherits from spapr-pci-host-bridge);
> > > 
> > > * device#1->group->container->device#2,.. creation scheme is changed to
> > > group->container->devices scheme (on ppc64);
> > > 
> > > * removed vfio_iommu_spapr_tce_dma_map/vfio_iommu_spapr_tce_dma_unmap
> > > as they are not really necessary now and it does not look like we will
> > > need them in the nearest future.
> > > 
> > > Comments?
> > 
> > Not happy with this... :(
> 
> So, part of that will be my fault, not Alexey's, because of how I told
> him to do this.
> 
> > > Alexey Kardashevskiy (3):
> > >   vfio: make some of VFIO API public
> > 
> > Exports and conditionalizes vfio_get_group for no good reason other than
> > to get a pointer to a group that's not connected to a container.  Not
> > only is the bool parameter ugly, it has a poor interface - call it once
> > with "false" and get an unconnected group, call it for the same group
> > with "true", and still get back an unconnected group.  I'd say
> > vfio_get_group should be split to have group and container connect
> > separate, but I don't think you even need that...
> 
> The boolean is ugly, yes.  I was suggesting splitting a group_create()
> function from get_group().
> 
> > >   vfio: add support for spapr platform (powerpc64 server)
> > 
> > ...because this patch ignores the extensibility of
> > vfio_connect_container.  There is absolutely no reason for
> > vfio_container_spapr_alloc to exist.  It duplicates most of
> > vfio_connect_container, apparently using the attempt to reuse containers
> > as an excuse, even though this should happily fall through unless the
> > host code is broken.  vfio supports containers capable of multiple
> > groups, it does not assume it or require it.
> 
> So, initially Alexey did extend vfio_connect_container(), but I think
> this is wrong.  vfio_connect_container() does something different from
> what we want on pseries, and not just because of details of the IOMMU
> interface.
> 
> Apart from the construction of the container object (which could be
> done by a shared helper), the guts of vfio_connect_container() does
> two things:
>       a) Finds a container for the group.  But it assumes any
> container is suitable, if it will accept the group.  That works for
> x86 because every container has the same mappings - all of RAM.  Even
> if we were able to extend the kernel interface to support multiple
> groups per container it would be wrong for pseries though: the PAPR
> interface requires that each guest host bridge have independent DMA
> mappings.  So the group *must* be bound to the container associated
> with this device's guest host bridge, no other choice can be correct.

This is not true.  vfio_connect_container _tests_ whether any of the
other containers are compatible, it makes no assumptions.  The
VFIO_GROUP_SET_CONTAINER must fail if the group is not compatible with
the existing container.  I'm not asking you to change that, I'm asking
that you implement exactly that meaning that this loop:

    QLIST_FOREACH(container, &container_list, next) {
        if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
            group->container = container;
            QLIST_INSERT_HEAD(&container->group_list, group, container_next);
            return 0;
        }
    }

never reaches the inner branch on spapr.  If the spapr vfio iommu driver
doesn't do this and allows, but does not support, multiple groups to be
connected to a container, it's a bug.  A group is only going to do this
once and it's only at setup, so I don't think there's a valid
performance argument here either.

>       b) It wires up the container to the MemorListener, so that it
> maps all RAM.  Our model doesn't do that.

This is done within the type1 specific setup, ignore it.

> Note that both these considerations would become relevant for x86,
> too, if and when a guest-visible IOMMU interface is implemented: you
> wouldn't want to wire up a memory listener, and the (guest) device
> would have to be wired to the container associated with its guest
> IOMMU domain, not any accepting container.

At that point the problem is actually different because we potentially
have multiple viable iommu backends on the same host that will need to
behave a little differently.  That's not the case for spapr.  There's
only one viable iommu backend and it exactly fits the model that
vfio_connect_container was designed for.

> The fundamental point is we have two very different models of using
> the IOMMU, the "map all" and "guest visible" models..  And while it
> happens that the PAPR IOMMU has constraints which mean it can't be
> used in map all mode, and we never currently use the x86 IOMMUs in
> guest visible mode, there's nothing to prevent a future IOMMU being
> usable in either mode.  In fact there's nothing preventing using Type1
> in either mode right now, except for the exercise of coding the guest
> IOMMU interface into qemu.  So the model of IOMMU usage should not be
> treated as simply a detail of the IOMMU type.

If and when we have multiple viable iommu backends on a single platform,
I agree, we may need to be smarter about when groups get merged into the
same container.  We're not there yet.  spapr doesn't plan to support
multiple groups within a container, so whether we test it or not should
be a non-issue.  The current kernel implementation needs to disallow it.
The memory listener setup is completely contained within type1 setup, so
that should also be a non-issue.

> Now, what I do wonder is if we ought to have two different qemu device
> types for a vfio device in map all mode, which will find and prepare
> its own container and one in guest visible mode which must be told
> which container to use, and fail if it can't add itself to that
> container.

A different vfio- device type is an option, but I'm not seeing a
sufficient arguments above for why the existing model doesn't work.
Thanks,

Alex

> > >   spapr vfio: add spapr-pci-vfio-host-bridge to support vfio
> > 
> > And obviously this is the consumer of all that, which really just wants
> > some parameters from the iommu info ioctl and a way to call map and
> > unmap for dma.  What if instead of all of the above, we simply:
> > 
> > 1) export vfio_dma_map/vfio_dma_unmap
> > 2) add spapr support to vfio_connect_container (skip the info and enable
> > chunks)
> > 2) add to vfio.c and export:
> > 
> > VFIOContainer *vfio_spapr_php_init(int groupid, struct 
> > vfio_iommu_spapr_tce_info *info)
> > {
> >     vfio_get_group(groupid); // including connect
> >     ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, info);
> >     ioctl(container->fd, VFIO_IOMMU_ENABLE);
> >     return group->container;
> > }
> > 
> > Isn't that all you really need?  Thanks,
> > 
> > Alex
> > 
> > >  hw/misc/vfio.c              |   94 ++++++++++++++++++++--
> > >  hw/ppc/spapr_iommu.c        |  151 +++++++++++++++++++++++++++--------
> > >  hw/ppc/spapr_pci.c          |  186 
> > > +++++++++++++++++++++++++++++++++++++++++--
> > >  include/hw/misc/vfio.h      |   20 +++++
> > >  include/hw/pci-host/spapr.h |   12 +++
> > >  include/hw/ppc/spapr.h      |    3 +
> > >  trace-events                |    4 +
> > >  7 files changed, 422 insertions(+), 48 deletions(-)
> > >  create mode 100644 include/hw/misc/vfio.h
> > > 
> > 
> > 
> > 
> > 
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]