qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k


From: Christian Schoenebeck
Subject: Re: [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k
Date: Wed, 10 Nov 2021 14:14:43 +0100

On Mittwoch, 10. November 2021 11:05:50 CET Stefan Hajnoczi wrote:
> > > > So looks like it was probably still easier and realistic to just add
> > > > virtio
> > > > capabilities for now for allowing to exceed current descriptor limit.
> > > 
> > > I'm still not sure why virtio-net, virtio-blk, virtio-fs, etc perform
> > > fine under today's limits while virtio-9p needs a much higher limit to
> > > achieve good performance. Maybe there is an issue in a layer above the
> > > vring that's causing the virtio-9p performance you've observed?
> > 
> > Are you referring to (somewhat) recent benchmarks when saying those would
> > all still perform fine today?
> 
> I'm not referring to specific benchmark results. Just that none of those
> devices needed to raise the descriptor chain length, so I'm surprised
> that virtio-9p needs it because it's conceptually similar to these
> devices.

I would not say virtio-net and virtio-blk were comparable with virtio-9p and 
virtio-fs. virtio-9p and virtio-fs are fully fledged file servers which must 
perform various controller tasks before handling the actually requested I/O 
task, which inevitably adds latency to each request, whereas virtio-net and 
virtio-blk are just very thin layers that do not have that controller task 
overhead per request. And a network device only needs to handle very small 
messages in the first place.

> > Vivek was running detailed benchmarks for virtiofs vs. 9p:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg02704.html
> > 
> > For the virtio aspect discussed here, only the benchmark configurations
> > without cache are relevant (9p-none, vtfs-none) and under this aspect the
> > situation seems to be quite similar between 9p and virtio-fs. You'll also
> > note once DAX is enabled (vtfs-none-dax) that apparently boosts virtio-fs
> > performance significantly, which however seems to corelate to numbers
> > when I am running 9p with msize > 300k. Note: Vivek was presumably
> > running 9p effecively with msize=300k, as this was the kernel limitation
> > at that time.
> Agreed, virtio-9p and virtiofs are similar without caching.
> 
> I think we shouldn't consider DAX here since it bypasses the virtqueue.

DAX bypasses virtio, sure, but the performance boost you get with DAX actually 
shows the limiting factor with virtio pretty well.

> > To bring things into relation: there are known performance aspects in 9p
> > that can be improved, yes, both on Linux kernel side and on 9p server
> > side in QEMU. For instance 9p server uses coroutines [1] and currently
> > dispatches between worker thread(s) and main thread too often per request
> > (partly addressed already [2], but still WIP), which accumulates to
> > overall latency. But Vivek was actually using a 9p patch here which
> > disabled coroutines entirely, which suggests that the virtio transmission
> > size limit still represents a bottleneck.
> 
> These results were collected with 4k block size. Neither msize nor the
> descriptor chain length limits will be stressed, so I don't think these
> results are relevant here.
> 
> Maybe a more relevant comparison would be virtio-9p, virtiofs, and
> virtio-blk when block size is large (e.g. 1M). The Linux block layer in
> the guest will split virtio-blk requests when they exceed the block
> queue limits.

I am sorry, I cannot spend time for more benchmarks like that. For really 
making fair comparisons I would need to review all their code on both ends, 
adjust configuration/sources, etc.

I do think that I performed enough benchmarks and tests to show that 
increasing the transmission size can significantly improve performance with 
9p, and that allowing to exceed the queue size does make sense even for small 
transmission sizes (e.g. max. active requests on 9p server side vs. max. 
transmission size per request).

The reason for the performance gain is the minimum latency involved per 
request, and like I said, that can be improved to a certain extent with 9p, 
but that will take a long time and it could not be eliminated entirely.

As you are apparently reluctant for changing the virtio specs, what about 
introducing those discussed virtio capabalities either as experimental ones 
without specs changes, or even just as 9p specific device capabilities for 
now. I mean those could be revoked on both sides at any time anyway.

Best regards,
Christian Schoenebeck





reply via email to

[Prev in Thread] Current Thread [Next in Thread]