[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3] timer: Fix a race condition between timer's callback and
From: |
Paolo Bonzini |
Subject: |
Re: [PATCH v3] timer: Fix a race condition between timer's callback and destroying code |
Date: |
Thu, 27 Jun 2024 19:15:49 +0200 |
On Thu, Jun 27, 2024 at 6:12 PM Roman Kiryanov <rkir@google.com> wrote:
>
> On Thu, Jun 27, 2024 at 6:27 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On Thu, Jun 27, 2024 at 2:32 AM Roman Kiryanov <rkir@google.com> wrote:
> > > + if (qatomic_read(&ts->cb_running)) {
> > > + qemu_event_wait(&timer_list->timers_done_ev);
> > > + }
> >
> > qemu_event_wait() already has the right atomic magic, and
> > ts->cb_running is both redundant (in general), and I think racy (as
> > implemented in this patch).
>
> I added cb_running to avoid waiting for timers_done_ev if we know our
> cb is done.
Yes, but it's very tricky. Assuming we want to fix it in the timer
core, the QemuEvent should be enough, no need to optimize it. On the
other hand, I'm still worried about deadlocks (more below).
> > But especially, you haven't justified in the commit message _why_ you
> > need this.
>
> I mentioned the problem of cleanup racing with the timer's callback function
> in the current shape of QEMU.
Yes, but it was not clear what are the involved threads. It is clear
now that you have a function in a separate thread, creating a timer in
the main QEMU event loop.
> > using
> > aio_bh_schedule_oneshot() or aio_wait_bh_oneshot() to synchronize
> > everything with the AioContext thread seems like a superior solution
> > to me.
>
> Could you please elaborate? The problem we want to solve is this:
>
> void myThreadFunc() {
> CallbackState callbackState;
> QEMUTimer timer;
>
> timer_init(&timer, myClockType, myScale, &myTimerCallbackFunc,
> &callbackState);
> ...
> timer_del(&timer);
> }
>
> Currently, myTimerCallbackFunc could fire after myThreadFunc exits
> (if timer_del runs between qemu_mutex_unlock and cb(opaque) in
> timerlist_run_timers) and callbackState gets destroyed.
Ok, got it now. I agree that qemu_event_wait() is safe for you here
because you are in a completely separate thread. But I'm worried that
it causes deadlocks in QEMU where the timer callback and the timer_del
run in the same thread.
I think the easiest options would be:
1) if possible, allocate the timer and the callbackState statically in
the device.
2) use "aio_wait_bh_oneshot(qemu_get_aio_context(), [](void
*opaque){}, NULL);" after timer_del(). You can also put the timer and
the callbackState in a RAII wrapper, so that aio_wait_bh_oneshot() is
executed when the RAII wrapper is destructed
Another thing that you could do is to use a shared_ptr<> for the
timer+callbackState combo, and pass a weak_ptr<> to the timer. Then:
- at the beginning of the timer, you upgrade the weak_ptr with lock()
and if it fails, return
- at the end of myThreadfunc, you destruct the shared_ptr before
deleting the timer.
I'm not sure how you'd pass the weak_ptr/shared_ptr to a callback
(Rust has Weak::into_raw/Weak::from_raw, but I don't know C++ well
enough). That may be overkill.
Paolo