[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Question on dirty sync before kvm memslot removal
From: |
Paolo Bonzini |
Subject: |
Re: Question on dirty sync before kvm memslot removal |
Date: |
Mon, 30 Mar 2020 15:11:53 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 |
On 27/03/20 16:04, Peter Xu wrote:
> That makes perfect sense to me, however... What if the vcpu generates
> dirty bits _after_ we do KVM_GET_DIRTY_LOG but _before_ we send
> KVM_SET_USER_MEMORY_REGION to remove the memslot? If the vcpu is in
> the userspace I think it's fine because BQL is needed so it won't be
> able to, however the vcpus running inside KVM should not be restricted
> to that. I think those dirty bits will still get lost, but it should
> be extremely hard to trigger.
Yes, you've found a bug.
> I'm not sure whether I missed something above, but if I'm correct, I
> think the solution should be a flag for KVM_SET_USER_MEMORY_REGION to
> set the memslot as invalid (KVM_MEM_INVALID), then when removing the
> memslot which has KVM_MEM_LOG_DIRTY_PAGES enabled, we should:
>
> - send KVM_SET_USER_MEMORY_REGION with KVM_MEM_INVALID to invalidate
> the memslot, but keep the slot and bitmap in KVM
>
> - send KVM_GET_DIRTY_LOG to fetch the bitmap for the slot
>
> - send KVM_SET_USER_MEMORY_REGION with size==0 to remove the slot
Or KVM_MEM_READONLY.
> However I don't know whether that'll worth it.
Yes, especially in the light of the dirty ring issue below.
> (Side question which is irrelevant to this: for kvm dirty ring we now
> need to do similar thing to flush dirty bits before removing a
> memslot, however that's even trickier because flushing dirty ring
> needs to kick all vcpu out, currently the RFC series is using
> run_on_cpu() which will release the BQL and wait for all vcpus to
> quit into userspace, however that cannot be done inside
> kvm_set_phys_mem because it needs the BQL. I'm still thinking about
> a good way to fix this, but any idea is greatly welcomed :)
The problem here is also that the GFN is not an unique identifier of the
QEMU ram_addr_t. However you don't really need to kick all vCPUs out,
do you? You can protect the dirty ring with its own per-vCPU mutex and
harvest the pages from the main thread.
Paolo