[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 16/16] cpus-common: lock-free fast path for cpu_
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [PATCH 16/16] cpus-common: lock-free fast path for cpu_exec_start/end |
Date: |
Wed, 21 Sep 2016 20:19:18 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 |
On 21/09/2016 19:24, Emilio G. Cota wrote:
> On Mon, Sep 19, 2016 at 14:50:59 +0200, Paolo Bonzini wrote:
>> Set cpu->running without taking the cpu_list lock, only look at it if
>> there is a concurrent exclusive section. This requires adding a new
>> field to CPUState, which records whether a running CPU is being counted
>> in pending_cpus. When an exclusive section is started concurrently with
>> cpu_exec_start, cpu_exec_start can use the new field to wait for the end
>> of the exclusive section.
>>
>> This a separate patch for easier bisection of issues.
>>
>> Signed-off-by: Paolo Bonzini <address@hidden>
>> ---
>> cpus-common.c | 73
>> ++++++++++++++++++++++++++++++++++++++++------
>> docs/tcg-exclusive.promela | 53 +++++++++++++++++++++++++++++++--
>> include/qom/cpu.h | 5 ++--
>> 3 files changed, 117 insertions(+), 14 deletions(-)
>>
>> diff --git a/cpus-common.c b/cpus-common.c
>> index f7ad534..46cf8ef 100644
>> --- a/cpus-common.c
>> +++ b/cpus-common.c
>> @@ -184,8 +184,12 @@ void start_exclusive(void)
>>
>> /* Make all other cpus stop executing. */
>> pending_cpus = 1;
>> +
>> + /* Write pending_cpus before reading other_cpu->running. */
>> + smp_mb();
>> CPU_FOREACH(other_cpu) {
>> if (other_cpu->running) {
>> + other_cpu->has_waiter = true;
>> pending_cpus++;
>> qemu_cpu_kick(other_cpu);
>> }
>> @@ -212,24 +216,75 @@ void end_exclusive(void)
>> /* Wait for exclusive ops to finish, and begin cpu execution. */
>> void cpu_exec_start(CPUState *cpu)
>> {
>> - qemu_mutex_lock(&qemu_cpu_list_mutex);
>> - exclusive_idle();
>> cpu->running = true;
>> - qemu_mutex_unlock(&qemu_cpu_list_mutex);
>> +
>> + /* Write cpu->running before reading pending_cpus. */
>> + smp_mb();
>> +
>> + /* 1. start_exclusive saw cpu->running == true and pending_cpus >= 1.
>> + * After taking the lock we'll see cpu->has_waiter == true and run---not
>> + * for long because start_exclusive kicked us. cpu_exec_end will
>> + * decrement pending_cpus and signal the waiter.
>> + *
>> + * 2. start_exclusive saw cpu->running == false but pending_cpus >= 1.
>> + * This includes the case when an exclusive item is running now.
>> + * Then we'll see cpu->has_waiter == false and wait for the item to
>> + * complete.
>> + *
>> + * 3. pending_cpus == 0. Then start_exclusive is definitely going to
>> + * see cpu->running == true, and it will kick the CPU.
>> + */
>> + if (pending_cpus) {
>> + qemu_mutex_lock(&qemu_cpu_list_mutex);
>> + if (!cpu->has_waiter) {
>> + /* Not counted in pending_cpus, let the exclusive item
>> + * run. Since we have the lock, set cpu->running to true
>> + * while holding it instead of retrying.
>> + */
>> + cpu->running = false;
>> + exclusive_idle();
>> + /* Now pending_cpus is zero. */
>> + cpu->running = true;
>> + } else {
>> + /* Counted in pending_cpus, go ahead. */
>> + }
>> + qemu_mutex_unlock(&qemu_cpu_list_mutex);
>> + }
>
> wrt scenario (3): I don't think other threads will always see cpu->running ==
> true.
> Consider the following:
>
> cpu0 cpu1
> ---- ----
>
> cpu->running = true; pending_cpus = 1;
> smp_mb(); smp_mb();
> if (pending_cpus) { /* false */ } CPU_FOREACH(other_cpu) { if
> (other_cpu->running) { /* false */ } }
>
> The barriers here don't guarantee that changes are immediately visible to
> others
> (for that we need strong ops, i.e. atomics).
No, this is not true. Barriers order stores and loads within a thread
_and_ establish synchronizes-with edges.
In the example above you are violating causality:
- cpu0 stores cpu->running before loading pending_cpus
- because pending_cpus == 0, cpu1 stores pending_cpus = 1 after cpu0
loads it
- cpu1 loads cpu->running after it stores pending_cpus
hence the only valid ordering is
cpu->running = true
if (pending_cpus)
pending_cpus
if (other_cpu->running)
> Is there a performance (scalability) reason behind this patch?
Yes: it speeds up all cpu_exec_start/end, _not_ start/end_exclusive.
With this patch, as long as there are no start/end_exclusive (which are
supposed to be rare) there is no contention on multiple CPUs doing
cpu_exec_start/end.
Without it, as CPUs increase, the global cpu_list_mutex is going to
become a bottleneck.
Paolo
[Qemu-devel] [PATCH 16/16] cpus-common: lock-free fast path for cpu_exec_start/end, Paolo Bonzini, 2016/09/19
Re: [Qemu-devel] [PATCH 16/16] cpus-common: lock-free fast path for cpu_exec_start/end, Emilio G. Cota, 2016/09/21
Re: [Qemu-devel] [PATCH 16/16] cpus-common: lock-free fast path for cpu_exec_start/end, Paolo Bonzini, 2016/09/22
[Qemu-devel] [PATCH 14/16] cpus-common: Introduce async_safe_run_on_cpu(), Paolo Bonzini, 2016/09/19
Re: [Qemu-devel] [PATCH v7 00/16] cpu-exec: Safe work in quiescent state, no-reply, 2016/09/19
Re: [Qemu-devel] [PATCH v7 00/16] cpu-exec: Safe work in quiescent state, no-reply, 2016/09/19
Re: [Qemu-devel] [PATCH v7 00/16] cpu-exec: Safe work in quiescent state, Emilio G. Cota, 2016/09/21