qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues


From: David Hildenbrand
Subject: Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
Date: Thu, 2 Mar 2023 16:11:56 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0

On 02.03.23 15:45, Peter Xu wrote:
On Thu, Mar 02, 2023 at 10:46:56AM +0100, David Hildenbrand wrote:
On 25.02.23 17:31, Peter Xu wrote:
[not for merging, but for discussion; this is something I found when
   looking at another issue on Chuang's optimization for migration downtime]

Summary: we tried to access memory_listeners, address_spaces, etc. in RCU
way.  However we didn't implement them with RCU-safety. This patchset is
trying to do that; at least making it closer.

NOTE!  It's doing it wrongly for now, so please feel free to see this as a
thread to start discussing this problem, as in subject.

The core problem here is how to make sure memory listeners will be freed in
RCU ways, per when unlinking them from the global memory_listeners list.

Can you elaborate why we would want to do that? Is there a real reason we
cannot hold the BQL when unregistering a listener?

Yes afaict we must hold BQL when unregister any listener for now.  I added
an explicit assert in patch 1 for that.


Oh, good!

We want to do that because potentially we have RCU readers accessing these
two lists, so here taking BQL only is not enough.  We need to release the
objects after all users are gone.

We already do that for address spaces, but afaict the listener part was
overlooked.  The challenge here is how to achieve the same for listeners.

Ouch, ok thanks.



Or could we use any other, more fine-grained, lock to protect the memory
listeners?

Naive me would think that any interactions between someone updating the
memory listeners, and a listener getting removed, would require some careful
synchronization (to not rip a notifier out while someone else notifies --
what is the still registered notifier supposed to do with notifications
while it is already going away?), instead of doing it via RCU.

I'm all for using RCU if it improves performance and keeps things simple. If
RCU is neither required for performance reason and overcomplicates the
implementation, maybe using locking is the better choice.

For ASes, one major user RCU is memory_region_find_rcu().

For listeners, the only path that doesn't take BQL (afaict) is
memory_region_clear_dirty_bitmap().  Maybe you'll have some points here on
the side effect of taking it because it's in either virtio-mem or balloon
path for page hinting iirc.

So, we could end up in memory_region_clear_dirty_bitmap() when migration starts (clearing initial bitmap), while migration is happening (migrating one page), and during virtio-balloon qemu_guest_free_page_hint.

There should be no direct call from virtio-mem (anymore), only from virtio-balloon. I don't think taking the BQL is particularly problematic here.

I guess the main concern here would be overhead from gabbing/releasing the BQL very often, and blocking the BQL while we're eventually in the kernel, clearing bitmaps, correct?


Indeed, memory listener registration/removal doesn't happen very frequently, while traversing the listeners can happen very often in migration code IIUC.


--
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]