qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL v2 42/61] physmem: Add helper function to destroy CPU AddressS


From: Ilya Leoshkevich
Subject: Re: [PULL v2 42/61] physmem: Add helper function to destroy CPU AddressSpace
Date: Wed, 05 Feb 2025 13:11:30 +0100
User-agent: Evolution 3.52.4 (3.52.4-2.fc40)

On Mon, 2024-08-19 at 16:22 +0100, Peter Maydell wrote:
> On Tue, 23 Jul 2024 at 11:59, Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > 
> > From: Salil Mehta <salil.mehta@huawei.com>
> > 
> > Virtual CPU Hot-unplug leads to unrealization of a CPU object. This
> > also
> > involves destruction of the CPU AddressSpace. Add common function
> > to help
> > destroy the CPU AddressSpace.
> 
> Based on some testing I've been doing that tries to use
> (a variation of) this function to do the cleanup of the
> CPU address spaces, I think there's a problem with it.
> (This doesn't matter for 9.1 because nothing calls this
> function as yet.)
> 
> > +void cpu_address_space_destroy(CPUState *cpu, int asidx)
> > +{
> > +    CPUAddressSpace *cpuas;
> > +
> > +    assert(cpu->cpu_ases);
> > +    assert(asidx >= 0 && asidx < cpu->num_ases);
> > +    /* KVM cannot currently support multiple address spaces. */
> > +    assert(asidx == 0 || !kvm_enabled());
> > +
> > +    cpuas = &cpu->cpu_ases[asidx];
> > +    if (tcg_enabled()) {
> > +        memory_listener_unregister(&cpuas->tcg_as_listener);
> > +    }
> > +
> > +    address_space_destroy(cpuas->as);
> > +    g_free_rcu(cpuas->as, rcu);
> 
> RCU doesn't guarantee the order in which it executes the
> rcu reclaim hooks, so we can run the g_free() of cpuas-as
> *before* the do_address_space_destroy hook that
> address_space_destroy() sets up. This means we free the
> RCU node that the latter hook is using, and then
> do_address_space_destroy is never called (and I think also
> I was seeing the RCU callback thread get stalled entirely,
> because the list node it wanted to traverse was garbage.)
> 
> However, I don't understand how to fix this -- how is a
> caller of address_space_destroy() supposed to know when it
> can free the memory containing the AddressSpace ?
> Paolo: do you understand how this should work? We seem
> to already use address_space_destroy() in various places
> usually for an AS that's embedded in a device struct --
> how do we ensure that the destroy has finished before we
> free the device memory ?
> 
> > +
> > +    if (asidx == 0) {
> > +        /* reset the convenience alias for address space 0 */
> > +        cpu->as = NULL;
> > +    }
> > +
> > +    if (--cpu->cpu_ases_count == 0) {
> > +        g_free(cpu->cpu_ases);
> > +        cpu->cpu_ases = NULL;
> > +    }
> > +}
> 
> thanks
> -- PMM

I tried using this function, and there are indeed a couple problems
with it. First of all, with TCG, there is a use-after-free:

    Thread 1:
    
        cpu_address_space_destroy()
          memory_listener_unregister()
            listener_del_address_space()
              tcg_commit()  # via listener->commit()
                async_run_on_cpu(tcg_commit_cpu, cpuas)
          g_free(cpu->cpu_ases)  # frees cpuas
    
    Thread 2:
    
        mttcg_cpu_thread_fn()
          process_queued_cpu_work()
            tcg_commit_cpu()
              cpuas->as  # cpuas is freed

This can be fixed by passing asidx instead of cpuas to
tcg_commit_cpu() and checking if it's still available.

Second, g_free_rcu() enqueues the same node as
address_space_destroy() -> call_rcu(do_address_space_destroy),
preventing do_address_space_destroy() from running altogether.
Coupled with the ordering concern, I think the right way to fix
this is to introduce address_space_destroy_and_free() that runs both
actions in the correct order.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]