qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?


From: Avi Kivity
Subject: Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
Date: Wed, 08 Oct 2014 15:28:57 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1


On 10/08/2014 03:22 PM, Michael S. Tsirkin wrote:
On Wed, Oct 08, 2014 at 01:59:13PM +0300, Avi Kivity wrote:
On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
Even more useful is getting rid of the desc array and instead passing descs
inline in avail and used.
You expect this to improve performance?
Quite possibly but this will have to be demonstrated.

The top vhost function in small packet workloads is vhost_get_vq_desc, and
the top instruction within that (50%) is the one that reads the first 8
bytes of desc.  It's a guaranteed cache line miss (and again on the guest
side when it's time to reuse).
OK so basically what you are pointing out is that we get 5 accesses:
read of available head, read of available ring, read of descriptor,
write of used ring, write of used ring head.
Right.  And only read of descriptor is not amortized.

If processing is in-order, we could build a much simpler design, with a
valid bit in the descriptor, cleared by host as descriptors are
consumed.

Basically get rid of both used and available ring.
That only works if you don't allow reordering, which is never the case for
block, and not the case for zero-copy net.  It also has writers on both side
of the ring.

The right design is to keep avail and used, but instead of making them rings
of pointers to descs, make them rings of descs.

The host reads descs from avail, processes them, then writes them back on
used (possibly out-of-order).  The guest writes descs to avail and reads
them back from used.

You'll probably have to add a 64-bit cookie to desc so you can complete
without an additional lookup.
My old presentation from 2012 or so suggested something like this.
We don't need a 64 bit cookie I think - a small 16 bit one
should be enough.


A 16 bit cookie means you need an extra table to hold the real request pointers.

With a 64-bit cookie you can store a pointer to the skbuff or bio in the ring itself, and avoid the extra lookup.

The extra lookup isn't the end of the world, since doesn't cross core boundaries, but it's worth avoiding.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]