qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: TR: Openstack NOVA - Improve the time of file system freeze during l


From: Pierre Libeau
Subject: RE: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot
Date: Thu, 20 Jan 2022 14:50:17 +0000

About the context:

In my case the file format is raw but it's can be also qcow2.


You have right in your explanation in nova it's not a "snapshot" but it's an image of the instance.

The goal of this image is to put it in glance after to store this image and create a new instance or rebuild an instance with this new image.


You have right, the result of "dev.rebase" is a mirror of the disk.


So my question is I break nothing when I'm moving the "Freeze guest filesystems" (step 2 in your process) just before "Cancel the mirror job" (step 3c in your process). I have tested it and it's working but I prefer to have your opinion.


About your question on the reason to do it like that related to QEMU 1.3 I will see with NOVA community. I'm a beginner at this part and your question is very good from my point of view.


Pierre

Public Cloud - VPS



De : Kevin Wolf <kwolf@redhat.com>
Envoyé : jeudi 20 janvier 2022 12:45
À : Pierre Libeau
Cc : qemu-block@nongnu.org; qemu-devel@nongnu.org; kchamart@redhat.com; pkrempa@redhat.com; eblake@redhat.com
Objet : Re: TR: Openstack NOVA - Improve the time of file system freeze during live-snapshot
 
Am 20.01.2022 um 09:02 hat Pierre Libeau geschrieben:
> Hello
>
> I'm forwarding to you my question because I have pushed on the wrong
> mailing list at the beginning. Can you give me your opinion or forward
> me to the right people who can help me.
>
> Thx.
>
> Pierre
>
>
> ________________________________
> De : Qemu-discuss <qemu-discuss-bounces+pierre.libeau=corp.ovh.com@nongnu.org> de la part de Pierre Libeau <pierre.libeau@ovhcloud.com>
> Envoyé : lundi 17 janvier 2022 08:43
> À : qemu-discuss@nongnu.org
> Objet : Openstack NOVA - Improve the time of file system freeze during live-snapshot
>
>
> Hello,
>
> I'm working on a patch in nova to improve the time of file system
> freeze during live-snapshot on an instance with a local disk and I
> need your opinion about the solution I would propose.
>
> My issue during the live migration is the duration of file system
> freeze on an instance with a big local disk. [1]
>
> In my case instance have locally a disk (400Go) and the
> qemu-guest-agent is installed.
>
> Nova process like that: [2]
> dev = guest.get_block_device(disk_path)
>
> 1. guest.freeze_filesystems()
> 2. dev.rebase(disk_delta, copy=True, reuse_ext=True, shallow=True)
> 3. while not dev.is_job_complete() #wait for the end of mirroring (the
>    issue is here, the waiting time depend on the size of the disk and
>    the IOPS)
> 4. dev.abort_job()
> 5. guest.thaw_filesystems()

So first of all, I have to do some translation of terminology which
seems to be different from what I am used to.

dev.rebase with copy=True seems to result in a mirror block job in QEMU?

So what you're calling a snapshot here doesn't seem to be a differential
snapshot (e.g. by adding a COW overlay), but a full copy that results in
two fully independent, standalone images. Is this right?

Adding a bit more context, the whole process seems to be:

1. Create a qcow2 for the copy of the top layer that shares the backing
   file with the active image.

2. Freeze guest filesystems

3. Create a full copy of the active layer (into the new qcow2 file)
    a. Start a mirror job
    b. Wait for the mirror job to move to the READY state
    c. Cancel the mirror job with force=false, i.e. complete the mirror
       job without changing the active image of the VM

4. Thaw the guest filesystems

5. qemu-img convert the copied top layer with its full backing chain
   to a standalone raw image

6. Delete the temporary qcow2 copy

> My proposition is to move the freeze after the end of mirroring and
> before the stop of mirroring. [3] I have tried on an instance and the
> last written file on the fs corresponds to the end of the mirror.

Yes, you only need the freeze around the mirror job completion, that is,
step 3c above.

However, the whole process seems very complicated for a rather simple
operation. A comment mentions that the dance with the temporary qcow2
file is because of a (not further specified) bug in QEMU 1.3. I believe,
libvirt hasn't supported a QEMU version that old for a while, so is this
really still a valid reason?

But what I would actually have used is a backup block job, which makes
sure that the copy will contain the disk content at the point of time
when the block job was started rather than when it happened to complete.

I'm adding a few more people to CC who may have additional comments on
this.

Kevin


reply via email to

[Prev in Thread] Current Thread [Next in Thread]