qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH v1 00/26] migration: File based migration with multifd an


From: Daniel P . Berrangé
Subject: Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram
Date: Tue, 18 Apr 2023 17:58:44 +0100
User-agent: Mutt/2.2.9 (2022-11-12)

On Fri, Mar 31, 2023 at 12:27:48PM -0400, Peter Xu wrote:
> On Fri, Mar 31, 2023 at 05:10:16PM +0100, Daniel P. Berrangé wrote:
> > On Fri, Mar 31, 2023 at 11:55:03AM -0400, Peter Xu wrote:
> > > On Fri, Mar 31, 2023 at 12:30:45PM -0300, Fabiano Rosas wrote:
> > > > Peter Xu <peterx@redhat.com> writes:
> > > > 
> > > > > On Fri, Mar 31, 2023 at 11:37:50AM -0300, Fabiano Rosas wrote:
> > > > >> >> Outgoing migration to file. NVMe disk. XFS filesystem.
> > > > >> >> 
> > > > >> >> - Single migration runs of stopped 32G guest with ~90% RAM usage. 
> > > > >> >> Guest
> > > > >> >>   running `stress-ng --vm 4 --vm-bytes 90% --vm-method all 
> > > > >> >> --verify -t
> > > > >> >>   10m -v`:
> > > > >> >> 
> > > > >> >> migration type  | MB/s | pages/s |  ms
> > > > >> >> ----------------+------+---------+------
> > > > >> >> savevm io_uring |  434 |  102294 | 71473
> > > > >> >
> > > > >> > So I assume this is the non-live migration scenario.  Could you 
> > > > >> > explain
> > > > >> > what does io_uring mean here?
> > > > >> >
> > > > >> 
> > > > >> This table is all non-live migration. This particular line is a 
> > > > >> snapshot
> > > > >> (hmp_savevm->save_snapshot). I thought it could be relevant because 
> > > > >> it
> > > > >> is another way by which we write RAM into disk.
> > > > >
> > > > > I see, so if all non-live that explains, because I was curious what's 
> > > > > the
> > > > > relationship between this feature and the live snapshot that QEMU also
> > > > > supports.
> > > > >
> > > > > I also don't immediately see why savevm will be much slower, do you 
> > > > > have an
> > > > > answer?  Maybe it's somewhere but I just overlooked..
> > > > >
> > > > 
> > > > I don't have a concrete answer. I could take a jab and maybe blame the
> > > > extra memcpy for the buffer in QEMUFile? Or perhaps an unintended effect
> > > > of bandwidth limits?
> > > 
> > > IMHO it would be great if this can be investigated and reasons provided in
> > > the next cover letter.
> > > 
> > > > 
> > > > > IIUC this is "vm suspend" case, so there's an extra benefit knowledge 
> > > > > of
> > > > > "we can stop the VM".  It smells slightly weird to build this on top 
> > > > > of
> > > > > "migrate" from that pov, rather than "savevm", though.  Any thoughts 
> > > > > on
> > > > > this aspect (on why not building this on top of "savevm")?
> > > > >
> > > > 
> > > > I share the same perception. I have done initial experiments with
> > > > savevm, but I decided to carry on the work that was already started by
> > > > others because my understanding of the problem was yet incomplete.
> > > > 
> > > > One point that has been raised is that the fixed-ram format alone does
> > > > not bring that many performance improvements. So we'll need
> > > > multi-threading and direct-io on top of it. Re-using multifd
> > > > infrastructure seems like it could be a good idea.
> > > 
> > > The thing is IMHO concurrency is not as hard if VM stopped, and when we're
> > > 100% sure locally on where the page will go.
> > 
> > We shouldn't assume the VM is stopped though. When saving to the file
> > the VM may still be active. The fixed-ram format lets us re-write the
> > same memory location on disk multiple times in this case, thus avoiding
> > growth of the file size.
> 
> Before discussing on reusing multifd below, now I have a major confusing on
> the use case of the feature..
> 
> The question is whether we would like to stop the VM after fixed-ram
> migration completes.  I'm asking because:
> 
>   1. If it will stop, then it looks like a "VM suspend" to me. If so, could
>      anyone help explain why we don't stop the VM first then migrate?
>      Because it avoids copying single pages multiple times, no fiddling
>      with dirty tracking at all - we just don't ever track anything.  In
>      short, we'll stop the VM anyway, then why not stop it slightly
>      earlier?
> 
>   2. If it will not stop, then it's "VM live snapshot" to me.  We have
>      that, aren't we?  That's more efficient because it'll wr-protect all
>      guest pages, any write triggers a CoW and we only copy the guest pages
>      once and for all.
> 
> Either way to go, there's no need to copy any page more than once.  Did I
> miss anything perhaps very important?
> 
> I would guess it's option (1) above, because it seems we don't snapshot the
> disk alongside.  But I am really not sure now..

It is both options above.

Libvirt has multiple APIs where it currently uses its migrate-to-file
approach

  * virDomainManagedSave()

    This saves VM state to an libvirt managed file, stops the VM, and the
    file state is auto-restored on next request to start the VM, and the
    file deleted. The VM CPUs are stopped during both save + restore
    phase

  * virDomainSave/virDomainRestore

    The former saves VM state to a file specified by the mgmt app/user.
    A later call to virDomaniRestore starts the VM using that saved
    state. The mgmt app / user can delete the file state, or re-use
    it many times as they desire. The VM CPUs are stopped during both
    save + restore phase

  * virDomainSnapshotXXX

    This family of APIs takes snapshots of the VM disks, optionally
    also including the full VM state to a separate file. The snapshots
    can later be restored. The VM CPUs remain running during the
    save phase, but are stopped during restore phase

All these APIs end up calling the same code inside libvirt that uses
the libvirt-iohelper, together with QEMU migrate:fd driver.

IIUC, Suse's original motivation for the performance improvements was
wrt to the first case of virDomainManagedSave. From the POV of actually
supporting this in libvirt though, we need to cover all the scenarios
there. Thus we need this to work both when CPUs are running and stopped,
and if we didn't use migrate in this case, then we basically just end
up re-inventing migrate again which IMHO is undesirable both from
libvirt's POV and QEMU's POV.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]