qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 00/10] aio: experimental virtio-blk polling m


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v3 00/10] aio: experimental virtio-blk polling mode
Date: Wed, 23 Nov 2016 09:51:31 +0000
User-agent: Mutt/1.7.1 (2016-10-04)

On Tue, Nov 22, 2016 at 08:21:16PM +0100, Christian Borntraeger wrote:
> On 11/22/2016 05:31 PM, Stefan Hajnoczi wrote:
> > v3:
> >  * Avoid ppoll(2)/epoll_wait(2) if polling succeeded [Paolo]
> >  * Disable guest->host virtqueue notification during polling [Christian]
> >  * Rebased on top of my virtio-blk/scsi virtqueue notification disable 
> > patches
> > 
> > v2:
> >  * Uninitialized node->deleted gone [Fam]
> >  * Removed 1024 polling loop iteration qemu_clock_get_ns() optimization 
> > which
> >    created a weird step pattern [Fam]
> >  * Unified with AioHandler, dropped AioPollHandler struct [Paolo]
> >    (actually I think Paolo had more in mind but this is the first step)
> >  * Only poll when all event loop resources support it [Paolo]
> >  * Added run_poll_handlers_begin/end trace events for perf analysis
> >  * Sorry, Christian, no virtqueue kick suppression yet
> > 
> > Recent performance investigation work done by Karl Rister shows that the
> > guest->host notification takes around 20 us.  This is more than the 
> > "overhead"
> > of QEMU itself (e.g. block layer).
> > 
> > One way to avoid the costly exit is to use polling instead of notification.
> > The main drawback of polling is that it consumes CPU resources.  In order to
> > benefit performance the host must have extra CPU cycles available on 
> > physical
> > CPUs that aren't used by the guest.
> > 
> > This is an experimental AioContext polling implementation.  It adds a 
> > polling
> > callback into the event loop.  Polling functions are implemented for 
> > virtio-blk
> > virtqueue guest->host kick and Linux AIO completion.
> > 
> > The QEMU_AIO_POLL_MAX_NS environment variable sets the number of 
> > nanoseconds to
> > poll before entering the usual blocking poll(2) syscall.  Try setting this
> > variable to the time from old request completion to new virtqueue kick.
> > 
> > By default no polling is done.  The QEMU_AIO_POLL_MAX_NS must be set to get 
> > any
> > polling!
> 
> The notification suppression alone gives me about 10% for a single disk in 
> fio throughput.
> (It seems that more disks make it help less???).

In a scenario with many disks there will be lots of notifications either
way.  ioeventfd offers a form of batching because it will coalesce
multiple notifications to the same virtqueue until QEMU gets around to
reading the ioeventfd.

In other words, under heavy load ioeventfd coalesces notifications so
QEMU will process the virtqueue fewer times even though the number of
vmexits is unchanged.

Maybe this plays a role?

> If I set polling to high values
> (e.g. 500000) then the guest->host notification rate basically drops to zero, 
> so it seems
> to work as expected. Polling also seems to provide some benefit in the range 
> of another
> 10 percent (again only for a single disk?)
> 
> So in general this looks promising. We want to keep it disabled as here until 
> we
> have some grow/shrink heuristics. There is one thing that the kernel can do, 
> which we
> cannot easily (check if the CPU is contented) and avoid polling in that case. 
> One wild
> idea can be to use clock_gettime with CLOCK_THREAD_CPUTIME_ID and 
> CLOCK_REALTIME and
> shrink polling if we have been scheduled away.
> 
> The case "number of iothreads > number of cpus" looks better than in v1. Have 
> you fixed 
> something?

v1 had a premature optimization (bug) where it ignored the precise
QEMU_MAX_AIO_POLL_NS value and instead ran in steps of 1024 polling
iterations.  Perhaps we're simply burning less CPU now since I removed
the 1024 loop granularity.

Glad that you are seeing improvements.  Self-tuning grow/shrink
heuristics is the next step so that polling can be used with real
workloads.  I'll investigate it for the next revision.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]