qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UI layer threading and locking strategy; memory_region_snapshot_and_


From: Peter Maydell
Subject: Re: UI layer threading and locking strategy; memory_region_snapshot_and_clear_dirty() races
Date: Tue, 22 Nov 2022 11:52:10 +0000

On Tue, 22 Nov 2022 at 08:04, Akihiko Odaki <akihiko.odaki@gmail.com> wrote:
>
> Hi,
>
> On 2022/11/22 7:37, Philippe Mathieu-Daudé wrote:
> > Cc'ing more UI/display contributors.
> >
> > On 17/11/22 14:05, Peter Maydell wrote:
> >> On Tue, 1 Nov 2022 at 14:17, Peter Maydell <peter.maydell@linaro.org>
> >> wrote:
> >>>
> >>> Hi; I'm trying to find out what the UI layer's threading and
> >>> locking strategy is, at least as far as it applies to display
> >>> device models.
> >>
> >> Ping! :-) I'm still looking for information about this,
> >> and about what threads call_rcu() callbacks might be run on...
>
> I briefly checked the code, and it looks like rcu has its own thread.
> Search for "thread" in util/rcu.c. Looking at call_rcu_thread() in the
> file will tell what kind of context the callbacks will be ran on.

Yes, I can read the code to find out what it does currently.
I'm hoping for an authoritative answer from the designer
about what the API guarantees are...

> >>> My tentative idea for a fix is a bit of an upheaval:
> >>>   * have the display devices set gfx_update_async = true
> >>>   * instead of doing everything synchronously in their gfx_update
> >>>     method, they do the initial setup and call an 'async' version
> >>>     of memory_region_snapshot_and_clear_dirty()
> >>>   * that async version of the function will do what it does today,
> >>>     but without trying to wait for TCG vCPUs
> >>>   * instead the caller arranges (via call_rcu(), probably) a
> >>>     callback that will happen once all the TCG CPUs have finished
> >>>     executing their current TB
> >>>   * that callback does the actual copy-from-guest-ram-to-display
> >>>     and then calls graphic_hw_update_done()
> >>>
> >>> This seems like an awful pain in the neck but I couldn't see
> >>> anything better :-(
>
> Converting memory_region_snapshot_and_clear_dirty() asynchronous is
> nice, but also don't forget about reordering things in
> framebuffer_update_display() so that the DisplaySurface reference
> happens after memory_region_snapshot_and_clear_dirty(). Even if you make
> memory_region_snapshot_and_clear_dirty() asynchronous, the bug will
> remain if you keep the stale reference of the DisplaySurface and pass it
> to the asynchronous callback.

Yes, it would need to either rearrange things, or else just take
a reference to the DisplaySurface and hold on to it until it's
done (which might result in a harmless write to a surface that's
no longer being actively used and will be thrown away when we
drop our reference). I think you probably need to take the reference,
because to identify the right arguments to take the memory snapshot
you need to know various properties of the DisplaySurface (eg its
size), so you need the DS both before and after the snapshot call.

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]