qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 8/9] async: update documentation of the memory barriers


From: Paolo Bonzini
Subject: Re: [PATCH v2 8/9] async: update documentation of the memory barriers
Date: Tue, 7 Mar 2023 11:49:54 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1

On 3/7/23 00:39, Richard Henderson wrote:
On 3/6/23 14:33, Paolo Bonzini wrote:
@@ -107,11 +114,8 @@ static QEMUBH *aio_bh_dequeue(BHList *head, unsigned *flags)
      QSLIST_REMOVE_HEAD(head, next);
      /*
-     * The qatomic_and is paired with aio_bh_enqueue().  The implicit memory -     * barrier ensures that the callback sees all writes done by the scheduling -     * thread.  It also ensures that the scheduling thread sees the cleared -     * flag before bh->cb has run, and thus will call aio_notify again if
-     * necessary.
+     * Synchronizes with qatomic_fetch_or() in aio_bh_enqueue(), ensuring that
+     * the removal finishes before BH_PENDING is reset.
       */
      *flags = qatomic_fetch_and(&bh->flags,

Per this new comment, about the remove finishing first, it would seem that we need smp_mb__before_rmw here, because QSLIST_REMOVE_HEAD is not SEQCST.

Only release-acquire is needed for consistent access to the list, seqcst and smp_mb__before/after_rmw() are only needed (as expected) to handle wakeup.

Just to be safe, I tried modeling this with cppmem (http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/); support for compare-and-swap is very limited, therefore the test looks nothing like the C code(*), but it should be ok:

int main() {
  atomic_int x = 0;
  atomic_int y = 0;
  {{{
      { cas_strong_explicit(&x, 0, 1, mo_acquire, mo_acquire);
        x.load(mo_relaxed).readsvalue(1); // fetch_or returned 0
//      r1 = y.load(mo_relaxed);
        y.store(1, mo_release); }         // bh inserted

  ||| { y.load(mo_acquire).readsvalue(1); // bh was in list
        y.store(0, mo_relaxed);           // bh removed
//      r1 = x.load(mo_relaxed);
        x.store(0, mo_release); }         // fetch_and

  ||| { cas_strong_explicit(&x, 0, 2, mo_acquire, mo_acquire);
        x.load(mo_relaxed).readsvalue(2); // fetch_or returned 0
//      r1 = y.load(mo_relaxed);
        y.store(2, mo_release); }         // bh inserted

  }}};
  return 0;
}

You can uncomment the debugging instructions (r1 = ...) too, but leaving all three uncommented will blow up.

Use the release_acquire model since it's faster and there are no seqcst operations in the above test. It will take 1-2 minutes to run the model on a laptop, and the result shows that the only consistent (i.e. allowed by the C standard) execution is the one where thread 0 runs entirely before thread 1, and thread 1 runs entirely before thread 2. You get the opposite order if the "bh was in list" line is changed to "readsvalue(2)".

This matches the description I had sent yesterday.

Paolo

(*) A couple points of interest. First, there is no way to say "CAS has succeeded" so I am writing different values to x (this is not a problem because the code in QEMU only checks whether the pending bit was zero) and checking that they can be read back (which isn't a big limitation because other threads could sneak in anyway right after the checking load). Second, there is no way to say "reads something other than 0", so you cannot get both valid executions with one run of the model, instead you can change the "bh was in list" line to "readsvalue(2)".





reply via email to

[Prev in Thread] Current Thread [Next in Thread]