qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 0/5] virtio-pci: enable blk and scsi multi-queue by defaul


From: Stefan Hajnoczi
Subject: Re: [PATCH v5 0/5] virtio-pci: enable blk and scsi multi-queue by default
Date: Wed, 8 Jul 2020 13:59:46 +0100

On Wed, Jul 08, 2020 at 06:59:41AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 06, 2020 at 02:56:45PM +0100, Stefan Hajnoczi wrote:
> > v4:
> >  * Sorry for the long delay. I considered replacing this series with a 
> > simpler
> >    approach. Real hardware ships with a fixed number of queues (e.g. 128). 
> > The
> >    equivalent can be done in QEMU too. That way we don't need to magically 
> > size
> >    num_queues. In the end I decided against this approach because the Linux
> >    virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally 
> > initialized
> >    all available queues until recently (it was written with
> >    num_queues=num_vcpus in mind). It doesn't make sense for a 1 CPU guest to
> >    bring up 128 virtqueues (waste of resources and possibly weird 
> > performance
> >    effects with blk-mq).
> >  * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange]
> >  * Update commit descriptions to mention maximum MSI-X vector and virtqueue
> >    caps [Raphael]
> > v3:
> >  * Introduce virtio_pci_optimal_num_queues() helper to enforce 
> > VIRTIO_QUEUE_MAX
> >    in one place
> >  * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia]
> >  * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael]
> > v3:
> >  * Add new performance results that demonstrate the scalability
> >  * Mention that this is PCI-specific [Cornelia]
> > v2:
> >  * Let the virtio-DEVICE-pci device select num-queues because the optimal
> >    multi-queue configuration may differ between virtio-pci, virtio-mmio, and
> >    virtio-ccw [Cornelia]
> > 
> > Enabling multi-queue on virtio-pci storage devices improves performance on 
> > SMP
> > guests because the completion interrupt is handled on the vCPU that 
> > submitted
> > the I/O request.  This avoids IPIs inside the guest.
> > 
> > Note that performance is unchanged in these cases:
> > 1. Uniprocessor guests.  They don't have IPIs.
> > 2. Application threads might be scheduled on the sole vCPU that handles
> >    completion interrupts purely by chance.  (This is one reason why 
> > benchmark
> >    results can vary noticably between runs.)
> > 3. Users may bind the application to the vCPU that handles completion
> >    interrupts.
> > 
> > Set the number of queues to the number of vCPUs by default on virtio-blk and
> > virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
> > for live migration compatibility.
> > 
> > Random read performance:
> >       IOPS
> > q=1    78k
> > q=32  104k  +33%
> > 
> > Boot time:
> >       Duration
> > q=1        51s
> > q=32     1m41s  +98%
> > 
> > Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks
> > 
> > Previously measured results on a 4 vCPU guest were also positive but showed 
> > a
> > smaller 1-4% performance improvement.  They are no longer valid because
> > significant event loop optimizations have been merged.
> 
> I'm guessing this should be deferred to the next release as
> it (narrowly) missed the freeze window. Does this make sense to you?

Yes, that is fine. Thanks!

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]