qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2
Date: Wed, 02 Jul 2014 12:23:02 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

Il 02/07/2014 12:01, Kevin Wolf ha scritto:
Am 02.07.2014 um 11:48 hat Paolo Bonzini geschrieben:
Il 02/07/2014 11:39, Kevin Wolf ha scritto:
Am 02.07.2014 um 11:13 hat Paolo Bonzini geschrieben:
I don't think starting with that fast path as _the_ solution is a good
idea. It would essentially restrict dataplane to the scenarios that used
to work well in 2.0 - just look at what the block.c read/write functions
do: no image formats, (almost?) no block jobs, no 4k sector support, no
writethrough mode, no zero detection, no throttling, no nothing.
Anything we want to keep while using the fast path we would have to
duplicate there.

You're entirely right (I wouldn't duplicate it though, I would just
sacrifice it).  But I think none of the bullets apply in maximum
performance situations, and fast paths are okay as long as they are
picked dynamically at run-time.

Fast paths are okay if there is no way to achieve the same performance
without them, but I'm not entirely convinced of that yet in our specific
case.

Another idea is to skip aio_notify() when we're sure the event loop
isn't blocked in g_poll().  Doing this is a thread-safe and lockless way
might be tricky though.

Yes, good idea for 2.2 but not now.

Isn't it a first approximation that it's unnecessary when we're already
running in the thread with the AIO main loop? (Which pretty much means
always with dataplane.) Or can it be required even when we don't switch
to a different thread?

That's not even that much of an approximation.  I think it's pretty
much the definition of when it's unnecessary.  Clever!

Probably not quite, because the AIO main loop thread might be doing
something else at the moment and would come back to handling things in
its main loop even without being notified.

But then aio_context_prepare (for the main iothread) or aio_poll (for dataplane threads) would check for bottom halves.

The problem is how you define the AIO main loop. One way is "who has the aio context lock", but sooner or later we will want to get ride of the "Big Dataplane Lock" that is aio_context_acquire/release. It's very hard otherwise to avoid lock inversion deadlocks in virtio-scsi-dataplane (which will likely use dma-helpers.c, not address_space_rw).

But it's probably close enough in practice.

An approximation is "it's unnecessary if we have the aio context
lock taken".  Which is also always the case with dataplane, but
never with non-dataplane (the main loop bypasses
aio_context_acquire/release). Adding rfifolock_is_owned is trivial.

Is the fix for the main loop as simple as just adding the acquire/
release pair, or does it involve more than that?

Yes, it should be. But see above about the possible short life of the aio context lock.

I would really prefer if the optimisations we apply for dataplane would
work even in the traditional case, improving the block layer as a whole
instead of just special cases.

I agree. But my hope is to get there by removing more "special" parts of dataplane, since I consider the aio context lock to be one.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]