[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state |
Date: |
Mon, 05 Sep 2016 16:51:43 +0100 |
User-agent: |
mu4e 0.9.17; emacs 25.1.11 |
Paolo Bonzini <address@hidden> writes:
> This is not the code I promised at the beginning of the week.
> It's better, and not only in that this one works. :) Instead of
> reinventing the wheel and using the new wheel for linux-user's
> start_exclusive/end_exclusive, the linux-user/ code is moved to
> cpus-common.c and reused as the synchronization mechanism behind
> async_safe_run_on_cpu.
>
> The code works and actually satisfies our needs very well. The only
> disadvantage is that safe work items will run with a mutex taken; the
> mutex fits decently in QEMU's hierarchy however, sitting between the
> BQL and the tb_lock.
The series is looking good. I only had a few comments on comments.
So for testing I applied:
1 file changed, 16 insertions(+), 5 deletions(-)
translate-all.c | 21 ++++++++++++++++-----
modified translate-all.c
@@ -20,6 +20,9 @@
#include <windows.h>
#endif
#include "qemu/osdep.h"
+#include <sys/types.h>
+#include <unistd.h>
+#include <sys/syscall.h>
#include "qemu-common.h"
@@ -57,7 +60,7 @@
#include "exec/log.h"
//#define DEBUG_TB_INVALIDATE
-//#define DEBUG_FLUSH
+#define DEBUG_FLUSH
/* make various TB consistency checks */
//#define DEBUG_TB_CHECK
@@ -456,7 +459,8 @@ static inline PageDesc *page_find(tb_page_addr_t index)
indicated, this is constrained by the range of direct branches on the
host cpu, as used by the TCG implementation of goto_tb. */
#if defined(__x86_64__)
-# define MAX_CODE_GEN_BUFFER_SIZE (2ul * 1024 * 1024 * 1024)
+/* # define MAX_CODE_GEN_BUFFER_SIZE (2ul * 1024 * 1024 * 1024) */
+# define MAX_CODE_GEN_BUFFER_SIZE (256 * 1024)
#elif defined(__sparc__)
# define MAX_CODE_GEN_BUFFER_SIZE (2ul * 1024 * 1024 * 1024)
#elif defined(__powerpc64__)
@@ -835,18 +839,25 @@ static void page_flush_tb(void)
static void do_tb_flush(CPUState *cpu, void *data)
{
unsigned tb_flush_req = (unsigned) (uintptr_t) data;
-
+ unsigned tb_flush_count;
tb_lock();
/* If it's already been done on request of another CPU,
* just retry.
*/
- if (atomic_read(&tcg_ctx.tb_ctx.tb_flush_count) != tb_flush_req) {
+ tb_flush_count = atomic_read(&tcg_ctx.tb_ctx.tb_flush_count);
+ if (tb_flush_count != tb_flush_req) {
+#if defined(DEBUG_FLUSH)
+ fprintf(stderr,"%s: (%d/%ld) skipping racing flush %x/%x\n", __func__,
+ getpid(), syscall(SYS_gettid), tb_flush_req, tb_flush_count);
+#endif
goto done;
}
#if defined(DEBUG_FLUSH)
- printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
+ fprintf(stderr, "%s: (%d/%ld, %x/%x) code_size=%ld nb_tbs=%d
avg_tb_size=%ld\n", __func__,
+ getpid(), syscall(SYS_gettid),
+ tb_flush_req, tb_flush_count,
(unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ?
((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)) /
And then ran the test in my pigz testing image:
./tests/docker/docker.py update pigz-test:armhf ./arm-linux-user/qemu-arm
retry.py -n 20 -c -- docker run -i -t --rm pigz-test:armhf pigz data.tar
An observed the results of flushing including racing flushes which would
ordinarily take out the system.
It would be nice to resurrect some of the tests/tcg stuff and see if we
can stress linux-user with creating and destroying threads. I guess the
work here also replaces Emilo's RCU cpu list.
>
> For performance, the last patch changes it to avoid condition variables
> in the fast path. (There are still two memory barriers; if desired they
> could be merged with the ones in rcu_read_lock/rcu_read_unlock). I am
> including a formal model of the algorithm; together with new documentation
> in include/qom/cpu.h, it accounts for most of the added lines of code.
> Still, it is completely optional.
>
> Paolo
>
> Alex Bennée (1):
> cpus: pass CPUState to run_on_cpu helpers
>
> Paolo Bonzini (5):
> cpus-common: move CPU list management to common code
> cpus-common: move exclusive work infrastructure from linux-user
> cpus-common: always defer async_run_on_cpu work items
> cpus-common: Introduce async_safe_run_on_cpu()
> cpus-common: lock-free fast path for cpu_exec_start/end
>
> Sergey Fedorov (6):
> cpus: Move common code out of {async_, }run_on_cpu()
> cpus: Rename flush_queued_work()
> linux-user: Use QemuMutex and QemuCond
> linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick()
> cpus-common: move CPU work item management to common code
> tcg: Make tb_flush() thread safe
>
> Makefile.target | 2 +-
> bsd-user/main.c | 30 +----
> cpu-exec.c | 12 +-
> cpus-common.c | 314
> +++++++++++++++++++++++++++++++++++++++++++++
> cpus.c | 99 +-------------
> docs/tcg-exclusive.promela | 224 ++++++++++++++++++++++++++++++++
> exec.c | 30 +----
> hw/i386/kvm/apic.c | 3 +-
> hw/i386/kvmvapic.c | 6 +-
> hw/ppc/ppce500_spin.c | 31 ++---
> hw/ppc/spapr.c | 6 +-
> hw/ppc/spapr_hcall.c | 17 +--
> include/exec/cpu-all.h | 4 +
> include/exec/cpu-common.h | 2 +
> include/exec/exec-all.h | 11 --
> include/exec/tb-context.h | 2 +-
> include/qom/cpu.h | 95 +++++++++++++-
> kvm-all.c | 21 +--
> linux-user/main.c | 130 ++++++-------------
> target-i386/helper.c | 19 ++-
> target-i386/kvm.c | 6 +-
> target-s390x/cpu.c | 4 +-
> target-s390x/cpu.h | 7 +-
> target-s390x/kvm.c | 98 +++++++-------
> target-s390x/misc_helper.c | 4 +-
> translate-all.c | 38 ++++--
> vl.c | 1 +
> 27 files changed, 814 insertions(+), 402 deletions(-)
> create mode 100644 cpus-common.c
> create mode 100644 docs/tcg-exclusive.promela
--
Alex Bennée
- Re: [Qemu-devel] [PATCH 12/12] cpus-common: lock-free fast path for cpu_exec_start/end, (continued)
- Re: [Qemu-devel] [PATCH v6 00/12] cpu-exec: Safe work in quiescent state,
Alex Bennée <=