qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [RFC PATCH resend 00/11] Make CoMutex/CoQueue/CoRwlock thre


From: Paolo Bonzini
Subject: [Qemu-devel] [RFC PATCH resend 00/11] Make CoMutex/CoQueue/CoRwlock thread-safe
Date: Fri, 15 Apr 2016 13:31:55 +0200

[this time including the mailing list]

This is yet another tiny bit of the multiqueue work, this time affecting
the synchronization infrastructure for coroutines.  Currently, coroutines
synchronize between the main I/O thread and the dataplane iothread through
the AioContext lock.  However, for multiqueue a single BDS will be used
by multiple iothreads and hence multiple AioContexts.  This calls for
a different approach to coroutine synchronization.  This series is my
first attempt at it.  Besides multiqueue, I think throttling groups
(which can already span multiple AioContexts) could also take advantage
of the new CoMutexes.

The series has two main parts:

- it makes CoMutex bind coroutines to an AioContexts.  It is of course
  still possible to move coroutines around with explicit yield and
  qemu_coroutine_enter, but a CoMutex will now reenter the coroutine
  where it was running before, rather than in the AioContext of
  the previous owner.  To do this, a new function aio_co_schedule is
  introduced to run a coroutine on a given iothread.  I think this could
  be used in other places too; for now it lets a CoMutex protect stuff
  across multiple AioContexts without them moving around(*).  Of course
  currently a CoMutex is generally not used across multiple iothreads,
  because you have to acquire/release the AioContext around CoMutex
  critical sections.  However...

- ... the second change is exactly to make CoMutex thread-safe and remove
  the need for a "thread-based" mutex around it.  The new CoMutex is
  exactly the same as a mutex implementation that you'd find in an
  operating system.  iothreads are the moral equivalent of CPUs in
  a kernel, while coroutines resemble kernel threads running without
  preemption on a CPU.  Even if you have to take concurrency into account,
  the lack of preemption while running coroutines or bottom halves
  keeps the complexity at bay.  For example, it is easy to synchronize
  between qemu_co_mutex_lock's yield and the qemu_coroutine_enter in
  aio_co_schedule's bottom half.

  Same as before, CoMutex puts coroutines to sleep with
  qemu_coroutine_yield and wake them up with the new aio_co_schedule.
  I could have wrapped CoMutex's CoQueue with a "regular" thread mutex or
  spinlock.  The resulting code would have looked a lot like RFifoLock
  (with CoQueue replacing RFifoLock's condition variable).  Rather,
  inspired by the parallel between coroutines and non-preemptive kernel
  threads, I chose to adopt the same lock-free mutex algorithm as OSv.
  The algorithm only needs two to four atomic ops for a lock-unlock pair
  (two when uncontended).  To cover CoQueue, each CoQueue is made to
  depend on a CoMutex, similar to condition variables.  Most CoQueues
  already have a corresponding CoMutex so this is not a big deal;
  converting the others is left for a future series.  I did this because
  CoQueue looks a lot like OSv's waitqueues; so if necessary, we could
  even take OSv's support for wait morphing (which avoids the thundering
  herd problem) and add it to CoMutex and CoQueue.  This may be useful
  when making tracked_requests thread-safe.

Kevin: this has nothing to do with my old plan from Brno, and it's
possibly a lot closer to what you wanted.  Your idea was to automatically
release the "thread mutex" when a coroutine yields, I think you should
be fine with not having a thread mutex at all!

This will need some changes in the formats because, for multiqueue,
CoMutexes would need to be used like "real" thread mutexes.  Code like
this:

    ...
    qemu_co_mutex_unlock()
    ... /* still access shared data, but don't yield */
    qemu_coroutine_yield()

might be required to use this other pattern:

    ... /* access shared data, but don't yield */
    qemu_co_mutex_unlock()
    qemu_coroutine_yield()

because "adding a second CPU" is already introducing concurrency that
wasn't there before.  The "non-preemptive multitasking" reference only
applies to things that access AioContext-local data.  This includes the
synchronization primitives implemented in this series, the thread pool,
the Linux AIO support, but not much else.  It still simplifies _those_
though. :)

Anyhow, we'll always have some BlockDriver that do not support multiqueue,
such as the network protocols.  Thus it would be possible to handle the
formats one at a time.  raw-posix, raw and qcow2 would already form a
pretty good set, and the first two do not use CoMutex at all.

The patch has quite a lot of new code, but about half of it is testcases.
The new test covers correctness and (when run with --verbose) also takes a
stab at measuring performance; the results is that performance of CoMutex
is comparable to pthread mutexes.  The only snag is that that you cannot
make a direct comparison between CoMutex (fair) and pthread_mutex_t
(unfair).  For this reason the testcase also measures performance of a
quick-and-dirty implementation of a fair mutex, based on MCS locks +
futexes.

There's a lot of meat in the above text, and I hope it will make the code
clearer and compensate for the terse commit messages. :)  I'll probably
write a single-threaded testcase too, just to provide some more unit
test comparison of "before" and "after".

I haven't even started a guest with this patches, let alone run
qemu-iotests... generally the changes are well confined to unit tested
code, patch 2 for example is completely untested.  There are a couple
other places that at least need more comments, but I wanted to throw
the patch out for an early review of the general approach.

Paolo Bonzini (11):
  coroutine: use QSIMPLEQ instead of QTAILQ
  throttle-groups: restart throttled requests from coroutine context
  coroutine: delete qemu_co_enter_next
  aio: introduce aio_co_schedule
  coroutine-lock: reschedule coroutine on the AioContext it was running on
  coroutine-lock: make CoMutex thread-safe
  coroutine-lock: add limited spinning to CoMutex
  test-aio-multithread: add performance comparison with thread-based mutexes
  coroutine-lock: place CoMutex before CoQueue in header
  coroutine-lock: add mutex argument to CoQueue APIs
  coroutine-lock: make CoRwlock thread-safe and fair

 async.c                      |  38 ++++
 block/backup.c               |   2 +-
 block/io.c                   |   2 +-
 block/qcow2-cluster.c        |   4 +-
 block/sheepdog.c             |   2 +-
 block/throttle-groups.c      |  71 +++++--
 hw/9pfs/9p.c                 |   2 +-
 include/block/aio.h          |  14 ++
 include/qemu/atomic.h        |   3 +
 include/qemu/coroutine.h     | 106 ++++++----
 include/qemu/coroutine_int.h |  14 +-
 iothread.c                   |  16 ++
 stubs/iothread.c             |  11 ++
 tests/Makefile               |  11 +-
 tests/iothread.c             | 107 ++++++++++
 tests/iothread.h             |  25 +++
 tests/test-aio-multithread.c | 452 +++++++++++++++++++++++++++++++++++++++++++
 trace-events                 |   8 +-
 util/qemu-coroutine-lock.c   | 257 ++++++++++++++++++++----
 util/qemu-coroutine.c        |   2 +-
 20 files changed, 1037 insertions(+), 110 deletions(-)
 create mode 100644 tests/iothread.c
 create mode 100644 tests/iothread.h
 create mode 100644 tests/test-aio-multithread.c

-- 
2.5.5




reply via email to

[Prev in Thread] Current Thread [Next in Thread]