qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] [PATCH qemu v8 00/14] spapr: vfio: Enable Dynamic DMA win


From: Alexey Kardashevskiy
Subject: Re: [Qemu-ppc] [PATCH qemu v8 00/14] spapr: vfio: Enable Dynamic DMA windows (DDW)
Date: Wed, 24 Jun 2015 20:52:40 +1000
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1

On 06/23/2015 04:44 PM, David Gibson wrote:
On Thu, Jun 18, 2015 at 09:37:22PM +1000, Alexey Kardashevskiy wrote:

(cut-n-paste from kernel patchset)

Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
where devices are allowed to do DMA. These ranges are called DMA windows.
By default, there is a single DMA window, 1 or 2GB big, mapped at zero
on a PCI bus.

PAPR defines a DDW RTAS API which allows pseries guests
querying the hypervisor about DDW support and capabilities (page size mask
for now). A pseries guest may request an additional (to the default)
DMA windows using this RTAS API.
The existing pseries Linux guests request an additional window as big as
the guest RAM and map the entire guest window which effectively creates
direct mapping of the guest memory to a PCI bus.

This patchset reworks PPC64 IOMMU code and adds necessary structures
to support big windows.

Once a Linux guest discovers the presence of DDW, it does:
1. query hypervisor about number of available windows and page size masks;
2. create a window with the biggest possible page size (today 4K/64K/16M);
3. map the entire guest RAM via H_PUT_TCE* hypercalls;
4. switche dma_ops to direct_dma_ops on the selected PE.

Once this is done, H_PUT_TCE is not called anymore for 64bit devices and
the guest does not waste time on DMA map/unmap operations.

Note that 32bit devices won't use DDW and will keep using the default
DMA window so KVM optimizations will be required (to be posted later).

This patchset adds DDW support for pseries. The host kernel changes are
required, posted as:

[PATCH kernel v11 00/34] powerpc/iommu/vfio: Enable Dynamic DMA windows

This patchset is based on git://github.com/dgibson/qemu.git spapr-next branch.

A couple of general queries - this touchs on the kernel part as well
as the qemu part:

  * Am I correct in thinking that the point in doing the
    pre-registration stuff is to allow the kernel to handle PUT_TCE
    in real mode?  i.e. that the advatage of doing preregistration
    rather than accounting on the DMA_MAP and DMA_UNMAP itself only
    appears once you have kernel KVM+VFIO acceleration?


Handling PUT_TCE includes 2 things:
1. get_user_pages_fast() and put_page()
2. update locked_vm

Both are tricky in real mode but 2) is also tricky in virtual mode as I have to deal with multiple unrelated 32bit and 64bit windows (VFIO does not care if they belong to one or many processes) with IOMMU page size==4K and gup/put_page working with 64k pages (our default page size for host kernel).

But yes, without keeping real mode handlers in mind, this thing could have been made simpler.


  * Do you have test numbers to show that it's still worthwhile to have
    kernel acceleration once you have a guest using DDW?  With DDW in
    play, even if PUT_TCE is slow, it should be called a lot less
    often.

With DDW, the whole RAM mapped once at first set_dma_mask(64bit) called by the guest, it is just a few PUT_TCE_INDIRECT calls.

If the guest uses DDW, real mode handlers cannot possibly beat it and I have reports that real mode handlers are noticibly slower than direct DMA mapping (i.e. DDW) for 40Gb devices (10Gb seems to be fine but I have not tried a dozen of guests yet).


The reason I ask is that the preregistration handling is a pretty big
chunk of code that inserts itself into some pretty core kernel data
structures, all for one pretty specific use case.  We only want to do
that if there's a strong justification for it.

Exactly. I keep asking Ben and Paul periodically if we want to keep it and the answer is always yes :)


About "vfio: spapr: Move SPAPR-related code to a separate file" - I guess I better off removing it for now, right?



--
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]