On 11/28/08, Andrea Arcangeli <address@hidden> wrote:
On Thu, Nov 27, 2008 at 09:14:45PM +0200, Blue Swirl wrote:
> The previous similar attempt by Anthony for generic DMA using vectored
> IO was abandoned because the malloc/free overhead was more than the
Even if there were dynamic allocations in the fast path, the overhead
of malloc/free is nothing if compared to running and waiting a host
kernel syscall to return every 4k, not to tell with O_DIRECT enabled
which is the whole point of having a direct-dma API that truly doesn't
pollute the cache. With O_DIRECT, without a real readv/writev I/O
performance would be destroyed going down to something like 10M/sec
even on the fastest storage/CPU/ram combinations.
So the question is how those benchmarks were run, with or without a
real readv/writev and with or without O_DIRECT to truly eliminate all
CPU cache pollution out of the memory copies?
I don't know, here's a pointer:
http://lists.gnu.org/archive/html/qemu-devel/2008-08/msg00092.html