[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 8/9] async: update documentation of the memory barriers
From: |
Paolo Bonzini |
Subject: |
Re: [PATCH v2 8/9] async: update documentation of the memory barriers |
Date: |
Wed, 8 Mar 2023 19:08:05 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 |
On 3/8/23 17:47, Richard Henderson wrote:
The case that I was imagining for smp_mb__before_rmw() is something
like this:
wake_me = true;
smp_mb__before_rmw();
if (qatomic_xchg(&can_sleep, true)) { ... }
where you really need a full barrier.
What is different about this that doesn't apply to the remove-head case
we've been talking about?
For remove-head, nothing is going to change the BH_PENDING flag while
the code runs. This would be an okay transformation of the code, at
both the compiler and the processor level:
// first part of atomic_fetch_and
old_flags = LDAXR of bh->flags
// QSLIST_REMOVE_HEAD ends up between load and store of
// atomic_fetch_and
all the loads and stores for remove-head
// second part of atomic_fetch_and
new_flags = old_flags & ~(BH_PENDING|BH_SCHEDULED|BH_IDLE);
(successful) STLXR of new_flags into bh->flags
Instead in the case above, I was thinking you would get a missed wakeup
without the barriers. Here is the full pseudocode:
// this store can move below the load of can_sleep
qatomic_set(&wake_me, true);
if (qatomic_xchg(&can_sleep, true)) sleep;
// this store can move below the load of wake_me
qatomic_set(&can_sleep, false);
if (qatomic_xchg(&wake_me, false)) wake them;
The buggy case is where the threads observe can_sleep = true, wake_me =
false, i.e. the original value of the variables. Let's look at it with
CPPMEM.
Like before, the CPPMEM test must use CAS and .readsvalue() to hack
around the lack of "if"s. Two .readsvalue() clauses represent the buggy
case, so we are all good if there is *no* consistent executions.
Unfortunately, it fails:
// 2 consistent (i.e. buggy) executions
int main() {
atomic_int w = 0;
atomic_int s = 1;
{{{
{ w.store(1, mo_relaxed);
// the buggy case is the one in which the threads read the
// original value of w and s. So here the CAS writes a
// dummy value different from both 0 and 1
cas_strong_explicit(&s, 0, 99, mo_seq_cst, mo_seq_cst);
s.load(mo_relaxed).readsvalue(1); }
|||
{ s.store(0, mo_relaxed);
// same as above
cas_strong_explicit(&w, 1, 99, mo_seq_cst, mo_seq_cst);
w.load(mo_relaxed).readsvalue(0); }
}}}
}
It works with barriers, which models the extra smp_mb__before_rmw():
// no consistent executions (i.e. it works)
int main() {
atomic_int w = 0;
atomic_int s = 1;
{{{
{ w.store(1, mo_relaxed);
atomic_thread_fence(mo_seq_cst);
cas_strong_explicit(&s, 0, 99, mo_relaxed, mo_relaxed);
s.load(mo_relaxed).readsvalue(1); }
|||
{ s.store(0, mo_relaxed);
atomic_thread_fence(mo_seq_cst);
cas_strong_explicit(&w, 1, 99, mo_relaxed, mo_relaxed);
w.load(mo_relaxed).readsvalue(0); }
}}}
}
I think I agree with you that _in practice_ it's going to work at the
processor level; the pseudo-assembly would be
r1 = LDAXR(can_sleep);
r2 = LDAXR(wake_me);
STR(can_sleep, false);
STLXR(wake_me, false); // successful
STR(wake_me, true);
STLXR(can_sleep, true); // successful (?)
if r1 == 0 { ... }
if r2 != 0 { ... }
and I can't think of a way in which both store-exclusives (or xchg, or
cmpxchg) would succeed. So perhaps smp_mb__before_rmw() could indeed be
a signal_fence(). But that's quite scary even for the standards of this
discussion...
Paolo
- [PATCH v2 7/9] physmem: add missing memory barrier, (continued)
- [PATCH v2 7/9] physmem: add missing memory barrier, Paolo Bonzini, 2023/03/06
- [PATCH v2 8/9] async: update documentation of the memory barriers, Paolo Bonzini, 2023/03/06
- Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Stefan Hajnoczi, 2023/03/06
- Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Richard Henderson, 2023/03/06
- Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Paolo Bonzini, 2023/03/07
- Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Richard Henderson, 2023/03/07
- Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Paolo Bonzini, 2023/03/07
- Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Richard Henderson, 2023/03/07
- Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Paolo Bonzini, 2023/03/08
- Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Richard Henderson, 2023/03/08
- Re: [PATCH v2 8/9] async: update documentation of the memory barriers,
Paolo Bonzini <=
- Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Richard Henderson, 2023/03/08
[PATCH v2 9/9] async: clarify usage of barriers in the polling case, Paolo Bonzini, 2023/03/06