qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] [QA-virtio]:Why vring size is limited to 1024?
Date: Wed, 8 Oct 2014 15:22:13 +0300

On Wed, Oct 08, 2014 at 01:59:13PM +0300, Avi Kivity wrote:
> 
> On 10/08/2014 01:55 PM, Michael S. Tsirkin wrote:
> >>>>Even more useful is getting rid of the desc array and instead passing 
> >>>>descs
> >>>>inline in avail and used.
> >>>You expect this to improve performance?
> >>>Quite possibly but this will have to be demonstrated.
> >>>
> >>The top vhost function in small packet workloads is vhost_get_vq_desc, and
> >>the top instruction within that (50%) is the one that reads the first 8
> >>bytes of desc.  It's a guaranteed cache line miss (and again on the guest
> >>side when it's time to reuse).
> >OK so basically what you are pointing out is that we get 5 accesses:
> >read of available head, read of available ring, read of descriptor,
> >write of used ring, write of used ring head.
> 
> Right.  And only read of descriptor is not amortized.
> 
> >If processing is in-order, we could build a much simpler design, with a
> >valid bit in the descriptor, cleared by host as descriptors are
> >consumed.
> >
> >Basically get rid of both used and available ring.
> 
> That only works if you don't allow reordering, which is never the case for
> block, and not the case for zero-copy net.  It also has writers on both side
> of the ring.
> 
> The right design is to keep avail and used, but instead of making them rings
> of pointers to descs, make them rings of descs.
> 
> The host reads descs from avail, processes them, then writes them back on
> used (possibly out-of-order).  The guest writes descs to avail and reads
> them back from used.
> 
> You'll probably have to add a 64-bit cookie to desc so you can complete
> without an additional lookup.

My old presentation from 2012 or so suggested something like this.
We don't need a 64 bit cookie I think - a small 16 bit one
should be enough.

> >
> >Sounds good in theory.
> >
> >>Inline descriptors will amortize the cache miss over 4 descriptors, and will
> >>allow the hardware to prefetch, since the descriptors are linear in memory.
> >If descriptors are used in order (as they are with current qemu)
> >then aren't they amortized already?
> >



reply via email to

[Prev in Thread] Current Thread [Next in Thread]