[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [RFC PATCH resend 00/11] Make CoMutex/CoQueue/CoRwlock thre
From: |
Paolo Bonzini |
Subject: |
[Qemu-devel] [RFC PATCH resend 00/11] Make CoMutex/CoQueue/CoRwlock thread-safe |
Date: |
Fri, 15 Apr 2016 13:31:55 +0200 |
[this time including the mailing list]
This is yet another tiny bit of the multiqueue work, this time affecting
the synchronization infrastructure for coroutines. Currently, coroutines
synchronize between the main I/O thread and the dataplane iothread through
the AioContext lock. However, for multiqueue a single BDS will be used
by multiple iothreads and hence multiple AioContexts. This calls for
a different approach to coroutine synchronization. This series is my
first attempt at it. Besides multiqueue, I think throttling groups
(which can already span multiple AioContexts) could also take advantage
of the new CoMutexes.
The series has two main parts:
- it makes CoMutex bind coroutines to an AioContexts. It is of course
still possible to move coroutines around with explicit yield and
qemu_coroutine_enter, but a CoMutex will now reenter the coroutine
where it was running before, rather than in the AioContext of
the previous owner. To do this, a new function aio_co_schedule is
introduced to run a coroutine on a given iothread. I think this could
be used in other places too; for now it lets a CoMutex protect stuff
across multiple AioContexts without them moving around(*). Of course
currently a CoMutex is generally not used across multiple iothreads,
because you have to acquire/release the AioContext around CoMutex
critical sections. However...
- ... the second change is exactly to make CoMutex thread-safe and remove
the need for a "thread-based" mutex around it. The new CoMutex is
exactly the same as a mutex implementation that you'd find in an
operating system. iothreads are the moral equivalent of CPUs in
a kernel, while coroutines resemble kernel threads running without
preemption on a CPU. Even if you have to take concurrency into account,
the lack of preemption while running coroutines or bottom halves
keeps the complexity at bay. For example, it is easy to synchronize
between qemu_co_mutex_lock's yield and the qemu_coroutine_enter in
aio_co_schedule's bottom half.
Same as before, CoMutex puts coroutines to sleep with
qemu_coroutine_yield and wake them up with the new aio_co_schedule.
I could have wrapped CoMutex's CoQueue with a "regular" thread mutex or
spinlock. The resulting code would have looked a lot like RFifoLock
(with CoQueue replacing RFifoLock's condition variable). Rather,
inspired by the parallel between coroutines and non-preemptive kernel
threads, I chose to adopt the same lock-free mutex algorithm as OSv.
The algorithm only needs two to four atomic ops for a lock-unlock pair
(two when uncontended). To cover CoQueue, each CoQueue is made to
depend on a CoMutex, similar to condition variables. Most CoQueues
already have a corresponding CoMutex so this is not a big deal;
converting the others is left for a future series. I did this because
CoQueue looks a lot like OSv's waitqueues; so if necessary, we could
even take OSv's support for wait morphing (which avoids the thundering
herd problem) and add it to CoMutex and CoQueue. This may be useful
when making tracked_requests thread-safe.
Kevin: this has nothing to do with my old plan from Brno, and it's
possibly a lot closer to what you wanted. Your idea was to automatically
release the "thread mutex" when a coroutine yields, I think you should
be fine with not having a thread mutex at all!
This will need some changes in the formats because, for multiqueue,
CoMutexes would need to be used like "real" thread mutexes. Code like
this:
...
qemu_co_mutex_unlock()
... /* still access shared data, but don't yield */
qemu_coroutine_yield()
might be required to use this other pattern:
... /* access shared data, but don't yield */
qemu_co_mutex_unlock()
qemu_coroutine_yield()
because "adding a second CPU" is already introducing concurrency that
wasn't there before. The "non-preemptive multitasking" reference only
applies to things that access AioContext-local data. This includes the
synchronization primitives implemented in this series, the thread pool,
the Linux AIO support, but not much else. It still simplifies _those_
though. :)
Anyhow, we'll always have some BlockDriver that do not support multiqueue,
such as the network protocols. Thus it would be possible to handle the
formats one at a time. raw-posix, raw and qcow2 would already form a
pretty good set, and the first two do not use CoMutex at all.
The patch has quite a lot of new code, but about half of it is testcases.
The new test covers correctness and (when run with --verbose) also takes a
stab at measuring performance; the results is that performance of CoMutex
is comparable to pthread mutexes. The only snag is that that you cannot
make a direct comparison between CoMutex (fair) and pthread_mutex_t
(unfair). For this reason the testcase also measures performance of a
quick-and-dirty implementation of a fair mutex, based on MCS locks +
futexes.
There's a lot of meat in the above text, and I hope it will make the code
clearer and compensate for the terse commit messages. :) I'll probably
write a single-threaded testcase too, just to provide some more unit
test comparison of "before" and "after".
I haven't even started a guest with this patches, let alone run
qemu-iotests... generally the changes are well confined to unit tested
code, patch 2 for example is completely untested. There are a couple
other places that at least need more comments, but I wanted to throw
the patch out for an early review of the general approach.
Paolo Bonzini (11):
coroutine: use QSIMPLEQ instead of QTAILQ
throttle-groups: restart throttled requests from coroutine context
coroutine: delete qemu_co_enter_next
aio: introduce aio_co_schedule
coroutine-lock: reschedule coroutine on the AioContext it was running on
coroutine-lock: make CoMutex thread-safe
coroutine-lock: add limited spinning to CoMutex
test-aio-multithread: add performance comparison with thread-based mutexes
coroutine-lock: place CoMutex before CoQueue in header
coroutine-lock: add mutex argument to CoQueue APIs
coroutine-lock: make CoRwlock thread-safe and fair
async.c | 38 ++++
block/backup.c | 2 +-
block/io.c | 2 +-
block/qcow2-cluster.c | 4 +-
block/sheepdog.c | 2 +-
block/throttle-groups.c | 71 +++++--
hw/9pfs/9p.c | 2 +-
include/block/aio.h | 14 ++
include/qemu/atomic.h | 3 +
include/qemu/coroutine.h | 106 ++++++----
include/qemu/coroutine_int.h | 14 +-
iothread.c | 16 ++
stubs/iothread.c | 11 ++
tests/Makefile | 11 +-
tests/iothread.c | 107 ++++++++++
tests/iothread.h | 25 +++
tests/test-aio-multithread.c | 452 +++++++++++++++++++++++++++++++++++++++++++
trace-events | 8 +-
util/qemu-coroutine-lock.c | 257 ++++++++++++++++++++----
util/qemu-coroutine.c | 2 +-
20 files changed, 1037 insertions(+), 110 deletions(-)
create mode 100644 tests/iothread.c
create mode 100644 tests/iothread.h
create mode 100644 tests/test-aio-multithread.c
--
2.5.5
- [Qemu-devel] [RFC PATCH resend 00/11] Make CoMutex/CoQueue/CoRwlock thread-safe,
Paolo Bonzini <=
- [Qemu-devel] [PATCH 03/11] coroutine: delete qemu_co_enter_next, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 01/11] coroutine: use QSIMPLEQ instead of QTAILQ, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 05/11] coroutine-lock: reschedule coroutine on the AioContext it was running on, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 06/11] coroutine-lock: make CoMutex thread-safe, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 02/11] throttle-groups: restart throttled requests from coroutine context, Paolo Bonzini, 2016/04/15
- [Qemu-devel] [PATCH 04/11] aio: introduce aio_co_schedule, Paolo Bonzini, 2016/04/15