Re: [Qemu-devel] [PULL 15/36] memory: fix race between TCG and accesses

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 15/36] memory: fix race between TCG and accesses

From:	Peter Maydell
Subject:	Re: [Qemu-devel] [PULL 15/36] memory: fix race between TCG and accesses to dirty bitmap
Date:	Tue, 2 Aug 2022 17:17:27 +0100

On Tue, 20 Aug 2019 at 08:12, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> There is a race between TCG and accesses to the dirty log:
>
>       vCPU thread                  reader thread
>       -----------------------      -----------------------
>       TLB check -> slow path
>         notdirty_mem_write
>           write to RAM
>           set dirty flag
>                                    clear dirty flag
>       TLB check -> fast path
>                                    read memory
>         write to RAM
>
> Fortunately, in order to fix it, no change is required to the
> vCPU thread.  However, the reader thread must delay the read after
> the vCPU thread has finished the write.  This can be approximated
> conservatively by run_on_cpu, which waits for the end of the current
> translation block.
>
> A similar technique is used by KVM, which has to do a synchronous TLB
> flush after doing a test-and-clear of the dirty-page flags.
>
> Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

> +static void tcg_log_global_after_sync(MemoryListener *listener)
> +{
> +    CPUAddressSpace *cpuas;
> +
> +    /* Wait for the CPU to end the current TB.  This avoids the following
> +     * incorrect race:
> +     *
> +     *      vCPU                         migration
> +     *      ----------------------       -------------------------
> +     *      TLB check -> slow path
> +     *        notdirty_mem_write
> +     *          write to RAM
> +     *          mark dirty
> +     *                                   clear dirty flag
> +     *      TLB check -> fast path
> +     *                                   read memory
> +     *        write to RAM
> +     *
> +     * by pushing the migration thread's memory read after the vCPU thread 
> has
> +     * written the memory.
> +     */
> +    cpuas = container_of(listener, CPUAddressSpace, tcg_as_listener);
> +    run_on_cpu(cpuas->cpu, do_nothing, RUN_ON_CPU_NULL);
> +}

So I think this implementation has two problems that we've only just
noticed:

(1) if memory_global_after_dirty_log_sync() is called in a context
where the caller holds the BQL, then the implementation here will
temporarily drop the BQL inside run_on_cpu(). That breaks
assumptions in pretty much all the graphics device models which
call memory_region_snapshot_and_clear_dirty() and assume that
things they'd determined before the call stay valid (pointers
into DisplaySurfaces, information about size and location of
the framebuffer in guest memory, etc).

(2) if memory_global_after_dirty_log_sync() is called in a context
where the caller does *not* hold the BQL, this is undefined
behaviour, because run_on_cpu() calls qemu_cond_wait() passing
it the BKL, and at least for the posix implementation that
ends up being posix_cond_wait() and calling that without the
mutex held is undefined behaviour. This happens for the migration
code for everything except the final iteration.

Does anybody have any bright ideas for fixing these? (2) I guess
we could handle just by having the migration code (or this
code right here) grab the BQL if it doesn't already have it.
(1) is much harder: you kind of conceptually want to do a "go back
and restart the update-display operation", but there's no clear
way to tell if you need to do that. You can postpone some things
(like "get the current UI display surface") until after the
call to memory_region_snapshot_and_clear_dirty(), and you could
have a "check a flag to see if the guest re-entered the display
device while we were inside memory_region_snapshot_and_clear_dirty()"
test, but some things the guest might do could potentially invalidate
things in a non-obvious action-at-a-distance way that doesn't
require an access to the display device. (e.g. a display device that
lets the guest program the FB to be anywhere in the address map,
plus the ability for the guest to cause remapping of RAM banks
or similar -- the display device can't tell if the RAM banks
got remapped or not, that's handled by a different device.)

thanks
-- PMM

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PULL 15/36] memory: fix race between TCG and accesses to dirty bitmap, Peter Maydell <=

Prev by Date: [PULL 5/5] virtiofsd: Disable killpriv_v2 by default
Next by Date: Re: [PATCH v7 11/14] KVM: Register/unregister the guest private memory regions
Previous by thread: [PULL 0/5] migration queue
Next by thread: [PATCH for 7.1] linux-user: fix compat with glibc >= 2.36 sys/mount.h
Index(es):
- Date
- Thread