qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: aio_wait_bh_oneshot() thread-safety question


From: Vladimir Sementsov-Ogievskiy
Subject: Re: aio_wait_bh_oneshot() thread-safety question
Date: Tue, 24 May 2022 16:46:21 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1

On 5/24/22 15:40, Kevin Wolf wrote:
Am 24.05.2022 um 09:08 hat Paolo Bonzini geschrieben:
On 5/23/22 18:04, Vladimir Sementsov-Ogievskiy wrote:

I have a doubt about how aio_wait_bh_oneshot() works. Exactly, I see
that data->done is not accessed atomically, and doesn't have any barrier
protecting it..

Is following possible:

main-loop                           iothread
                                  |
aio_wait_bh_oneshot()           |
      aio_bh_schedule_oneshot()   |
                                  |  handle bh:
                                  | 1. set data->done = true
                                  | 2. call aio_wait_kick(), inserting the
                                  | dummy bh into main context
                                  |
   ... in AIO_WAIT_WHILE():
     handle dummy bh, go to next
     iteration, but still read
     data->done=false due to some
     processor data reordering,
     go to next iteration of polling
     and hang
Yes, barriers are missing:

https://lore.kernel.org/qemu-devel/You6FburTi7gVyxy@stefanha-x1.localdomain/T/#md97146c6eae1fce2ddd687fdc3f2215eee03f6f4

It seems like the issue was never observed, at least on x86.

Why is the barrier in aio_bh_enqueue() not enough? Is the comment there
wrong?

aio_notify() has another barrier. This is a little bit too late, but if
I misunderstood the aio_bh_enqueue() one, it could explain why it was
never observed.

Kevin


I'd consider two cases:

1. aio_wait_kick() reads num_waiters as 0 and don't schedule any BH into main 
ctx.

In this case aio_wait_kick() only do one atomic operation: 
qatomic_read(&global_aio_wait.num_waiters), which is not a barrier as I 
understand.
So, data->done=true may be reordered with this operation.

main-loop                                iothread

  aio_wait_bh_oneshot()          |
     aio_bh_schedule_oneshot()   |
                                 |  atomic read num_waiters = 0 => don't kick
     AIO_WAIT_WHILE              |
      atomic inc num_waiters     |
      read done = false, go      |
      into blocking aio_poll()   |
                                 |  set data->done = true  # reordered to the 
end
                                 |    - but that doesn't help to wake main loop


For this case, iothread just don't call aio_bh_enqueue() and aio_notify(), so 
any barriers in them doesn't help


2. aio_wait_kick() reads num_waiters>0 and do schedule BH

In this case it seems you are right: if main-loop dequeued  dummy BH, it should be 
guaranteed that after handling this BH the main loop will see data->done=true.. 
That's if the comment is correct, hope it is. At least it corresponds to what I've 
read here : https://www.kernel.org/doc/Documentation/atomic_t.txt . How much 
generic this information is - I don't know.

In 2.12 there was no enque() deque() functions, but there was smp_wmb() in 
aio_bh_schedule_oneshot(), paired with atomic_xchg() in aio_bh_poll(), with 
similar comment about implicit barrier.


--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]