qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-ppc] [RFC v3 0/56] per-CPU locks


From: Emilio G. Cota
Subject: [Qemu-ppc] [RFC v3 0/56] per-CPU locks
Date: Thu, 18 Oct 2018 21:05:29 -0400

Cc: Aleksandar Markovic <address@hidden>
Cc: Alexander Graf <address@hidden>
Cc: Alistair Francis <address@hidden>
Cc: Andrzej Zaborowski <address@hidden>
Cc: Anthony Green <address@hidden>
Cc: Artyom Tarasenko <address@hidden>
Cc: Aurelien Jarno <address@hidden>
Cc: Bastian Koppelmann <address@hidden>
Cc: Christian Borntraeger <address@hidden>
Cc: Chris Wulff <address@hidden>
Cc: Cornelia Huck <address@hidden>
Cc: David Gibson <address@hidden>
Cc: David Hildenbrand <address@hidden>
Cc: "Edgar E. Iglesias" <address@hidden>
Cc: Eduardo Habkost <address@hidden>
Cc: Fabien Chouteau <address@hidden>
Cc: Guan Xuetao <address@hidden>
Cc: James Hogan <address@hidden>
Cc: Laurent Vivier <address@hidden>
Cc: Marek Vasut <address@hidden>
Cc: Mark Cave-Ayland <address@hidden>
Cc: Max Filippov <address@hidden>
Cc: Michael Clark <address@hidden>
Cc: Michael Walle <address@hidden>
Cc: Palmer Dabbelt <address@hidden>
Cc: Pavel Dovgalyuk <address@hidden>
Cc: Peter Crosthwaite <address@hidden>
Cc: Peter Maydell <address@hidden>
Cc: address@hidden
Cc: address@hidden
Cc: address@hidden
Cc: Richard Henderson <address@hidden>
Cc: Sagar Karandikar <address@hidden>
Cc: Stafford Horne <address@hidden>

I'm calling this series a v3 because it supersedes the two series
I previously sent about using atomics for interrupt_request:
  https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg02013.html
The approach in that series cannot work reliably; using (locked) atomics
to set interrupt_request but not using (locked) atomics to read it
can lead to missed updates.

This series takes a different approach: it serializes access to many
CPUState fields, including .interrupt_request, with a per-CPU lock.

Protecting more fields of CPUState with the lock then allows us to
substitute the BQL for the per-CPU lock in many places, notably
the execution loop in cpus.c. This leads to better scalability
for MTTCG, since CPUs don't have to acquire a contended lock
(the BQL) every time they stop executing code.

Some hurdles that remain:

1. I am not happy with the shutdown path via pause_all_vcpus.
   What happens if
   (a) A CPU is added while we're calling pause_all_vcpus?
   (b) Some CPUs are trying to run exclusive work while we
       call pause_all_vcpus?
   Am I being overly paranoid here?

2. I have done very light testing with x86_64 KVM, and no
   testing with other accels (hvf, hax, whpx). check-qtest
   works, except for an s390x test that to me is broken
   in master -- I reported the problem here:
     https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg03728.html

3. This might break record-replay. Furthermore, a quick test with icount
   on aarch64 seems to work, but I haven't tested icount extensively.

4. Some architectures still need the BQL in cpu_has_work.
   This leads to some contortions to avoid deadlock, since
   in this series cpu_has_work is called with the CPU lock held.

5. The interrupt handling path remains with the BQL held, mostly
   because the ISAs I routinely work with need the BQL anyway
   when handling the interrupt. We can complete the pushdown
   of the BQL to .do_interrupt/.exec_interrupt later on; this
   series is already way too long.

Points (1)-(3) makes this series an RFC and not a proper patch series.
I'd appreciate feedback on this approach and/or testing.

Note that this series fixes a bug by which cpu_has_work is
called without the BQL (from cpu_handle_halt). After
this series, cpu_has_work is called with the CPU lock,
and only the targets that need the BQL in cpu_has_work
acquire it.

For some performance numbers, see the last patch.

The series is checkpatch-clean; only one warning due to the
use of __COVERITY__ in cpus.c.

You can fetch this series from:

  https://github.com/cota/qemu/tree/cpu-lock-v3

Note that it applies on top of tcg-next + my dynamic TLB series,
which I'm using in the faint hope that the ubuntu experiments might
run a bit faster.

Thanks!

                Emilio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]