[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH v1 00/26] migration: File based migration with multifd an
From: |
Peter Xu |
Subject: |
Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram |
Date: |
Fri, 21 Apr 2023 09:56:36 -0400 |
On Fri, Apr 21, 2023 at 08:48:02AM +0100, Daniel P. Berrangé wrote:
> On Thu, Apr 20, 2023 at 03:19:39PM -0400, Peter Xu wrote:
> > On Thu, Apr 20, 2023 at 10:02:43AM +0100, Daniel P. Berrangé wrote:
> > > On Wed, Apr 19, 2023 at 03:07:19PM -0400, Peter Xu wrote:
> > > > On Wed, Apr 19, 2023 at 06:12:05PM +0100, Daniel P. Berrangé wrote:
> > > > > On Tue, Apr 18, 2023 at 03:26:45PM -0400, Peter Xu wrote:
> > > > > > On Tue, Apr 18, 2023 at 05:58:44PM +0100, Daniel P. Berrangé wrote:
> > > > > > > Libvirt has multiple APIs where it currently uses its
> > > > > > > migrate-to-file
> > > > > > > approach
> > > > > > >
> > > > > > > * virDomainManagedSave()
> > > > > > >
> > > > > > > This saves VM state to an libvirt managed file, stops the VM,
> > > > > > > and the
> > > > > > > file state is auto-restored on next request to start the VM,
> > > > > > > and the
> > > > > > > file deleted. The VM CPUs are stopped during both save +
> > > > > > > restore
> > > > > > > phase
> > > > > > >
> > > > > > > * virDomainSave/virDomainRestore
> > > > > > >
> > > > > > > The former saves VM state to a file specified by the mgmt
> > > > > > > app/user.
> > > > > > > A later call to virDomaniRestore starts the VM using that
> > > > > > > saved
> > > > > > > state. The mgmt app / user can delete the file state, or
> > > > > > > re-use
> > > > > > > it many times as they desire. The VM CPUs are stopped during
> > > > > > > both
> > > > > > > save + restore phase
> > > > > > >
> > > > > > > * virDomainSnapshotXXX
> > > > > > >
> > > > > > > This family of APIs takes snapshots of the VM disks,
> > > > > > > optionally
> > > > > > > also including the full VM state to a separate file. The
> > > > > > > snapshots
> > > > > > > can later be restored. The VM CPUs remain running during the
> > > > > > > save phase, but are stopped during restore phase
> > > > > >
> > > > > > For this one IMHO it'll be good if Libvirt can consider leveraging
> > > > > > the new
> > > > > > background-snapshot capability (QEMU 6.0+, so not very new..). Or
> > > > > > is there
> > > > > > perhaps any reason why a generic migrate:fd approach is better?
> > > > >
> > > > > I'm not sure I fully understand the implications of
> > > > > 'background-snapshot' ?
> > > > >
> > > > > Based on what the QAPI comment says, it sounds potentially
> > > > > interesting,
> > > > > as conceptually it would be nicer to have the memory / state snapshot
> > > > > represent the VM at the point where we started the snapshot operation,
> > > > > rather than where we finished the snapshot operation.
> > > > >
> > > > > It would not solve the performance problems that the work in this
> > > > > thread
> > > > > was intended to address though. With large VMs (100's of GB of RAM),
> > > > > saving all the RAM state to disk takes a very long time, regardless of
> > > > > whether the VM vCPUs are paused or running.
> > > >
> > > > I think it solves the performance problem by only copy each of the guest
> > > > page once, even if the guest is running.
> > >
> > > I think we're talking about different performance problems.
> > >
> > > What you describe here is about ensuring the snapshot is of finite size
> > > and completes in linear time, by ensuring each page is written only
> > > once.
> > >
> > > What I'm talking about is being able to parallelize the writing of all
> > > RAM, so if a single thread can saturate the storage, using multiple
> > > threads will make the overal process faster, even when we're only
> > > writing each page once.
> >
> > It depends on how much we want it. Here the live snapshot scenaior could
> > probably leverage a same multi-threading framework with a vm suspend case
> > because it can assume all the pages are static and only saved once.
> >
> > But I agree it's at least not there yet.. so we can directly leverage
> > multifd at least for now.
> >
> > >
> > > > Different from mostly all the rest of "migrate" use cases, background
> > > > snapshot does not use the generic dirty tracking at all (for KVM that's
> > > > get-dirty-log), instead it uses userfaultfd wr-protects, so that when
> > > > taking the snapshot all the guest pages will be protected once.
> > >
> > > Oh, so that means this 'background-snapshot' feature only works on
> > > Linux, and only when permissions allow it. The migration parameter
> > > probably should be marked with 'CONFIG_LINUX' in the QAPI schema
> > > to make it clear this is a non-portable feature.
> >
> > Indeed, I can have a follow up patch for this. But it'll be the same as
> > some other features, like, postcopy (and all its sub-features including
> > postcopy-blocktime and postcopy-preempt)?
> >
> > >
> > > > It guarantees the best efficiency of creating a snapshot with VM
> > > > running,
> > > > afaict. I sincerely think Libvirt should have someone investigating and
> > > > see whether virDomainSnapshotXXX() can be implemented by this cap rather
> > > > than the default migration.
> > >
> > > Since the background-snapshot feature is not universally available,
> > > it will only ever be possible to use it as an optional enhancement
> > > with virDomainSnapshotXXX, we'll need the portable impl to be the
> > > default / fallback.
> >
> > I am actually curious on how a live snapshot can be implemented correctly
> > if without something like background snapshot. I raised this question in
> > another reply here:
> >
> > https://lore.kernel.org/all/ZDWBSuGDU9IMohEf@x1n/
> >
> > I was using fixed-ram and vm suspend as example, but I assume it applies to
> > any live snapshot that is based on current default migration scheme.
> >
> > For a real live snapshot (not vm suspend), IIUC we have similar challenges.
> >
> > The problem is when migration completes (snapshot taken) the VM is still
> > running with a live disk image. Then how can we take a snapshot exactly at
> > the same time when we got the guest image mirrored in the vm dump? What
> > guarantees that there's no IO changes after VM image created but before we
> > take a snapshot on the disk image?
> >
> > In short, it's a question on how libvirt can make sure the VM image and
> > disk snapshot image be taken at exactly the same time for live.
>
> It is just a matter of where you have the synchronization point.
>
> With background-snapshot, you have to snapshot the disks at the
> start of the migrate operation. Without background-snapshot
> yu have to snapshot the disks at the end of the migrate
> operation. The CPUs are paused at the end of the migrate, so
> when the CPUs pause, initiate the storage snapshot in the
> background and then let the CPUs resume.
Ah, indeed.
Thanks.
--
Peter Xu
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, (continued)
Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Daniel P . Berrangé, 2023/04/18
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Peter Xu, 2023/04/18
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Daniel P . Berrangé, 2023/04/19
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Peter Xu, 2023/04/19
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Daniel P . Berrangé, 2023/04/20
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Peter Xu, 2023/04/20
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Daniel P . Berrangé, 2023/04/21
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram,
Peter Xu <=