qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v0 0/7] Background snapshots


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v0 0/7] Background snapshots
Date: Tue, 3 Jul 2018 13:54:47 +0800
User-agent: Mutt/1.10.0 (2018-05-17)

On Mon, Jul 02, 2018 at 03:40:31PM +0300, Denis Plotnikov wrote:
> 
> 
> On 02.07.2018 14:23, Peter Xu wrote:
> > On Fri, Jun 29, 2018 at 11:03:13AM +0300, Denis Plotnikov wrote:
> > > The patch set adds the ability to make external snapshots while VM is 
> > > running.
> > 
> > Hi, Denis,
> > 
> > This work is interesting, though I have a few questions to ask in
> > general below.
> > 
> > > 
> > > The workflow to make a snapshot is the following:
> > > 1. Pause the vm
> > > 2. Make a snapshot of block devices using the scheme of your choice
> > 
> > Here you explicitly took the snapshot for the block device, then...
> > 
> > > 3. Turn on background-snapshot migration capability
> > > 4. Start the migration using the destination (migration stream) of your 
> > > choice.
> > 
> > ... here you started the VM snapshot.  How did you make sure that the
> > VM snapshot (e.g., the RAM data) and the block snapshot will be
> > aligned?
> As the VM has been paused before making an image(disk) snapshot, there
> should be no requests to the original image done ever since. All the later
> request's goes to the disk snapshot.
> 
> At the point we have a disk image and its snapshot.
> In the image we have kind of checkpoint-ed state which won't (shouldn't) be
> changed because all the writing requests should go to the image snapshot.
> 
> Then we start the background snapshot which marks all the memory as
> read-only and writing the part of VM state to the VM snapshot file.
> By making the memory read-only we kind of freeze the state of the RAM.
> 
> At that point we have an original image and the VM memory content which
> corresponds to each other because the VM isn't running.
> 
> Then, the background snapshot thread continues VM execution with the
> read-only-marked memory which is being written to the external VM snapshot
> file. All the write accesses to the memory are intercepted and the memory
> pages being accessed are written to the VM snapshot (VM state) file in
> priority. Having marked as read-write right after the writing, the memory
> pages aren't tracked for later accesses.
> 
> This is how we guarantee that the VM snapshot (state) file has the memory
> content corresponding to the moment when the disk snapshot is created.
> 
> When the writing ends, we have the VM snapshot (VM state) file which has the
> memory content saved by the moment of the image snapshot creating.
> 
> So, to restore the VM from "the snapshot" we need to use the original image
> disk (not the disk snapshot) and the VM snapshot (VM state with saved
> memory) file.

My bad to have not noticed about the implication of vm_stop() as the
first step.  Your explanation is clear.  Thank you!

> 
> > 
> > For example, in current save_snapshot() we'll quiesce disk IOs before
> > migrating the last pieces of RAM data to make sure they are aligned.
> > I didn't figure out myself on how that's done in this work.
> > 
> > >     The migration will resume the vm execution by itself
> > >     when it has the devices' states saved  and is ready to start ram 
> > > writing
> > >     to the migration stream.
> > > 5. Listen to the migration finish event
> > > 
> > > The feature relies on KVM unapplied ability to report the faulting 
> > > address.
> > > Please find the KVM patch snippet to make the patchset work below:
> > > 
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -XXXX,X +XXXX,XX @@ static int handle_ept_violation(struct kvm_vcpu 
> > > *vcpu)
> > >          vcpu->arch.exit_qualification = exit_qualification;
> > > -       return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
> > > +       r = kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
> > > +        if (r == -EFAULT) {
> > > +               unsigned long hva = kvm_vcpu_gfn_to_hva(vcpu, gpa >> 
> > > PAGE_SHIFT);
> > > +
> > > +               vcpu->run->exit_reason = KVM_EXIT_FAIL_MEM_ACCESS;
> > > +               vcpu->run->hw.hardware_exit_reason = 
> > > EXIT_REASON_EPT_VIOLATION;
> > > +               vcpu->run->fail_mem_access.hva = hva | (gpa & 
> > > (PAGE_SIZE-1));
> > > +               r = 0;
> > > +
> > > +       }
> > > +       return r;
> > 
> > Just to make sure I fully understand here: so this is some extra KVM
> > work just to make sure the mprotect() trick will work even for KVM
> > vcpu threads, am I right?
> 
> That's correct!
> > 
> > Meanwhile, I see that you only modified EPT violation code, then how
> > about the legacy hardwares and softmmu case?
> 
> Didn't check thoroughly but the scheme works in TCG mode.

Yeah I guess TCG will work since the SIGSEGV handler will work with
that.  I meant the shadow MMU implementation in KVM when
kvm_intel.ept=0 is set on the host.  But of course that's not a big
deal for now since that can be discussed in the kvm counterpart of the
work.  Meanwhile, considering that this series seems to provide a
general framework for live snapshot, this work is meaningful no matter
what backend magic is used (either mprotect, or userfaultfd in the
future).

Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]