qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1


From: Blue Swirl
Subject: Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1
Date: Thu, 27 Nov 2008 21:14:45 +0200

On 11/27/08, Andrea Arcangeli <address@hidden> wrote:
> Hello everyone,
>
>  One major limitation for KVM today is the lack of a proper way to
>  write drivers in a way that allows the host OS to use direct DMA to
>  the guest physical memory to avoid any intermediate copy. The only API
>  provided to drivers seems to be the cpu_physical_memory_rw and that
>  enforces all drivers to bounce and trash cpu caches and be memory
>  bound. This new DMA API instead allows drivers to use a pci_dma_sg
>  method for SG I/O that will translate the guest physical addresses to
>  host virutal addresses and it will call two operation, one is a submit
>  method and one is the complete method. The pci_dma_sg may have to
>  bounce buffer internally and to limit the max bounce size it may have
>  to submit I/O in pieces with multiple submit calls. The patch adapts
>  the ide.c HD driver to use this. Once cdrom is converted too
>  dma_buf_rw can be eliminated. As you can see the new ide_dma_submit
>  and ide_dma_complete code is much more readable than the previous
>  rearming callback.
>
>  This is only tested with KVM so far but qemu builds, in general
>  there's nothing kvm specific here (with the exception of a single
>  kvm_enabled), so it should all work well for both.
>
>  All we care about is the performance of the direct path, so I tried to
>  avoid dynamic allocations there to avoid entering glibc, the current
>  logic doesn't satisfy me yet but it should be at least faster than
>  calling malloc (but I'm still working on it to avoid memory waste to
>  detect when more than one iov should be cached). But in case of
>  instabilities I recommend first thing to set MAX_IOVEC_IOVCNT 0 to
>  disable that logic ;). I recommend to test with DEBUG_BOUNCE and with
>  a 512 max bounce buffer too. It's running stable in all modes so
>  far. However if ide.c end up calling aio_cancel things will likely
>  fall apart but this is all because of bdrv_aio_readv/writev, and the
>  astonishing lack of aio_readv/writev in glibc!
>
>  Once we finish fixing storage performance with a real
>  bdrv_aio_readv/writev (now a blocker issue), a pci_dma_single can be
>  added for zero copy networking (one NIC per VM, or VMDq, IOV
>  etc..). The DMA API should allow for that too.

The previous similar attempt by Anthony for generic DMA using vectored
IO was abandoned because the malloc/free overhead was more than the
performance gain. Have you made any performance measurements? How does
this version compare to the previous ones?

I think the pci_ prefix can be removed, there is little PCI specific.

For Sparc32 IOMMU (and probably other IOMMUS), it should be possible
to register a function used in place of  cpu_physical_memory_rw,
c_p_m_can_dma etc. The goal is that it should be possible to stack the
DMA resolvers (think of devices behind a number of buses).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]