Re: [PATCH v2 8/9] async: update documentation of the memory barriers

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 8/9] async: update documentation of the memory barriers

From:	Paolo Bonzini
Subject:	Re: [PATCH v2 8/9] async: update documentation of the memory barriers
Date:	Tue, 7 Mar 2023 11:49:54 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1

On 3/7/23 00:39, Richard Henderson wrote:

On 3/6/23 14:33, Paolo Bonzini wrote:
@@ -107,11 +114,8 @@ static QEMUBH *aio_bh_dequeue(BHList *head,unsigned *flags)
      QSLIST_REMOVE_HEAD(head, next);
      /*
- * The qatomic_and is paired with aio_bh_enqueue(). The implicitmemory- * barrier ensures that the callback sees all writes done by thescheduling- * thread. It also ensures that the scheduling thread sees thecleared- * flag before bh->cb has run, and thus will call aio_notifyagain if
-     * necessary.
+ * Synchronizes with qatomic_fetch_or() in aio_bh_enqueue(),ensuring that
+     * the removal finishes before BH_PENDING is reset.
       */
      *flags = qatomic_fetch_and(&bh->flags,
Per this new comment, about the remove finishing first, it would seemthat we need smp_mb__before_rmw here, because QSLIST_REMOVE_HEAD is notSEQCST.

Only release-acquire is needed for consistent access to the list, seqcstand smp_mb__before/after_rmw() are only needed (as expected) to handlewakeup.

Just to be safe, I tried modeling this with cppmem(http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/); support forcompare-and-swap is very limited, therefore the test looks nothing likethe C code(*), but it should be ok:


int main() {
  atomic_int x = 0;
  atomic_int y = 0;
  {{{
      { cas_strong_explicit(&x, 0, 1, mo_acquire, mo_acquire);
        x.load(mo_relaxed).readsvalue(1); // fetch_or returned 0
//      r1 = y.load(mo_relaxed);
        y.store(1, mo_release); }         // bh inserted

  ||| { y.load(mo_acquire).readsvalue(1); // bh was in list
        y.store(0, mo_relaxed);           // bh removed
//      r1 = x.load(mo_relaxed);
        x.store(0, mo_release); }         // fetch_and

  ||| { cas_strong_explicit(&x, 0, 2, mo_acquire, mo_acquire);
        x.load(mo_relaxed).readsvalue(2); // fetch_or returned 0
//      r1 = y.load(mo_relaxed);
        y.store(2, mo_release); }         // bh inserted

  }}};
  return 0;
}

You can uncomment the debugging instructions (r1 = ...) too, but leavingall three uncommented will blow up.

Use the release_acquire model since it's faster and there are no seqcstoperations in the above test. It will take 1-2 minutes to run the modelon a laptop, and the result shows that the only consistent (i.e. allowedby the C standard) execution is the one where thread 0 runs entirelybefore thread 1, and thread 1 runs entirely before thread 2. You getthe opposite order if the "bh was in list" line is changed to"readsvalue(2)".


This matches the description I had sent yesterday.

Paolo

(*) A couple points of interest. First, there is no way to say "CAS hassucceeded" so I am writing different values to x (this is not a problembecause the code in QEMU only checks whether the pending bit was zero)and checking that they can be read back (which isn't a big limitationbecause other threads could sneak in anyway right after the checkingload). Second, there is no way to say "reads something other than 0",so you cannot get both valid executions with one run of the model,instead you can change the "bh was in list" line to "readsvalue(2)".

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v2 2/9] qemu-thread-posix: cleanup, fix, document QemuEvent, (continued)
- [PATCH v2 2/9] qemu-thread-posix: cleanup, fix, document QemuEvent, Paolo Bonzini, 2023/03/06
- [PATCH v2 3/9] qemu-thread-win32: cleanup, fix, document QemuEvent, Paolo Bonzini, 2023/03/06
- [PATCH v2 4/9] edu: add smp_mb__after_rmw(), Paolo Bonzini, 2023/03/06
- [PATCH v2 5/9] aio-wait: switch to smp_mb__after_rmw(), Paolo Bonzini, 2023/03/06
  - Re: [PATCH v2 5/9] aio-wait: switch to smp_mb__after_rmw(), David Hildenbrand, 2023/03/07
- [PATCH v2 6/9] qemu-coroutine-lock: add smp_mb__after_rmw(), Paolo Bonzini, 2023/03/06
- [PATCH v2 7/9] physmem: add missing memory barrier, Paolo Bonzini, 2023/03/06
- [PATCH v2 8/9] async: update documentation of the memory barriers, Paolo Bonzini, 2023/03/06
  - Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Stefan Hajnoczi, 2023/03/06
  - Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Richard Henderson, 2023/03/06
    - Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Paolo Bonzini <=
    - Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Richard Henderson, 2023/03/07
    - Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Paolo Bonzini, 2023/03/07
    - Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Richard Henderson, 2023/03/07
    - Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Paolo Bonzini, 2023/03/08
    - Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Richard Henderson, 2023/03/08
    - Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Paolo Bonzini, 2023/03/08
    - Re: [PATCH v2 8/9] async: update documentation of the memory barriers, Richard Henderson, 2023/03/08
- [PATCH v2 9/9] async: clarify usage of barriers in the polling case, Paolo Bonzini, 2023/03/06
  - Re: [PATCH v2 9/9] async: clarify usage of barriers in the polling case, Stefan Hajnoczi, 2023/03/06
  - Re: [PATCH v2 9/9] async: clarify usage of barriers in the polling case, David Hildenbrand, 2023/03/07

Prev by Date: Re: Adding Implementation Defined ARM cpu registers
Next by Date: [PATCH v1] Modified BMC FRU byte data in yosemitev2
Previous by thread: Re: [PATCH v2 8/9] async: update documentation of the memory barriers
Next by thread: Re: [PATCH v2 8/9] async: update documentation of the memory barriers
Index(es):
- Date
- Thread