Summary on new backup interfaces in QEMU

qemu-block
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Summary on new backup interfaces in QEMU

From:	Vladimir Sementsov-Ogievskiy
Subject:	Summary on new backup interfaces in QEMU
Date:	Tue, 15 Mar 2022 20:57:01 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0
Hi all!

Here I want to summarize new interfaces and use cases for backup in QEMU.

TODO for me: convert this into good rst documentation in docs/.

OK, let's begin.

First, note that drive-backup qmp command is deprecated.

Next, some terminology:

push backup: the whole process is inside QEMU process, also may be called "internal 
backup"

pull backup: QEMU only exports a kind of snapshot (for example by NBD), and third party 
software reads this export and stores it somehow, also called "external backup"

copy-before-write operations: We usually do backup of active disk, guest is 
running and may write to the disk during the process of backup. When guest 
wants to rewrite data region which is not backed up yet, we must stop this 
guest write, and copy original data to somewhere before continuing guest write. 
That's a copy-before-write operation.

image-fleecing: the technique that allows to export a "snapshotted" state of the active disk with help of 
copy-before-write operations. We create a temporary image - target for copy-before-write operations, and provide an 
interface to the user to read the "snapshotted" state. And for read, we do read from temporary image the data 
which is already changed in original active disk, and we read unchanged data directly from active disk. The temporary 
image itself is also called "reverse delta" or "reversed delta".



== Simple push backup ==

Just use blockdev-backup, nothing new here. I just note some technical details, 
that are relatively new:

1. First, backup job inserts copy-before-write filter above source disk, to do 
copy-before-write operation.
2. Created copy-before-write filter shares internal block-copy state with 
backup job, so they work in collaboration, to not copy same things twice.



== Full pull backup ==

Assume, we are going to do incremental backup in future, so we also need to 
create a dirty bitmap, to track dirtiness of active disk since full backup.

1. Create empty temporary image for fleecing. It must be of the same size as 
active disk. It's not necessary to be qcow2, and if it's a qcow2, you shouldn't 
make the original active disk a backing file for the new temporary qcow2 image 
(it was necessary in old fleecing scheme).

Example:
  qemu-img create -f qcow2 temp.qcow2 64G


2. Initialize fleecing scheme and create dirty bitmap for future incremental 
backup.

Assume, disk0 is an active disk, attached to qdev-id sda, to be backed up.

qmp: transaction [
   block-dirty-bitmap-add {node: disk0, name: bitmap0, persistent: true}
   blockdev-add* {node-name: tmp-protocol, driver: file, filename: temp.qcow2}
   blockdev-add {node-name: tmp, driver: qcow2, file: tmp-protocol}
   blockdev-add {node-name: cbw, driver: copy-before-write, file: disk0, 
target: tmp}
   blockdev-replace** {parent-type: qdev, qdev-id: sda, new-child: cbw}
   blockdev-add {node-name: acc, driver: snapshot-access, file: cbw}
]

qmp: nbd-server-start {...}
qmp: nbd-server-add {device: acc, ...}

This way we create the following block-graph:

[guest]                   [NBD export]
   |                            |
   | root                       | root
   v                 file       v
[copy-before-write]<------[snapshot-access]
   |           |
   | file      | target
   v           v
[active-disk] [temp.qcow2]

* "[PATCH 0/2] blockdev-add transaction" series needed for this
** "[PATCH v3 00/11] blockdev-replace" series needed for this


Note additional useful options for copy-before-write filter:

"[PATCH 0/3] block: copy-before-write: on-cbw-error behavior" provides 
possibility of option on-cbw-error=break-snapshot, which means that on failure of CBW 
operation we will not break guest write, but instead all further reads by NBD client will 
fail, which formally means: break the backup process, not guest write.

"[PATCH 0/4] block: copy-before-write: cbw-timeout" provides an option 
cbw-timeout, to set a timeout for CBW operations. That's very useful to avoid guest stuck.


3. Now third party backup tool can read data from NBD export

NBD_CMD_TRIM (discard) operation is supported on the export, it has the 
following effects:

1. discard this data from temp image, if it is stored here
2. avoid further copy-before-write operation (guest is free to rewrite 
corresponding data with no extra latency)
3. all further read requests from discarded areas by NBD client will fail

So, NBD client may discard regions that are already backed up to avoid extra 
latency for guest writes and to free disk space on the host.

Possible TODO here is to implement NBD protocol extension, that allows to READ 
& DISCARD in command. In this case we avoid extra command in the wire, but lose 
possibility of retrying the READ operation if it failed.

4. After backup is complete, we should destroy the fleecing scheme:

qmp: nbd-server-stop

qmp: blockdev-del {node-name: acc}
qmp: blockdev-replace {parent-type: qdev, qdev-id: sda, new-child: disk0}
qmp: blockdev-del {node-name: cbw}
qmp: blockdev-del {node-name: tmp}
qmp: blockdev-del {node-name: tmp-protocol}


5. If backup failed, we should remove created dirty bitmap:

qmp: block-dirty-bitmap-remove {node: disk0, name: bitmap0}



== Incremental pull backup ==

OK, now we have a bitmap called bitmap0, and want to do incremental backup, 
accordingly to that bitmap. In short, we want:

 - create a new bitmap to continue dirty tracking for next incremental backup
 - export "snapshotted" state of disk0 through NBD
 - export "frozen" bitmap, so that external tool know what to copy

Mostly, all points remains the same, let's go through:

1. Create empty temporary image for fleecing -- same as for full backup, no 
difference

2. Initialize fleecing scheme and create dirty bitmap for future incremental 
backup.

qmp: transaction [
   block-dirty-bitmap-add {node: disk0, name: bitmap1, persistent: true}
   block-dirty-bitmap-disable {node: disk0, name: bitmap0}
   blockdev-add {node-name: tmp-protocol, driver: file, filename: temp.qcow2}
   blockdev-add {node-name: tmp, driver: qcow2, file: tmp-protocol}
   blockdev-add {node-name: cbw, driver: copy-before-write, file: disk0, 
target: tmp, bitmap: {node: disk0, name: bitmap0}}
   blockdev-replace {parent-type: qdev, qdev-id: sda, new-child: cbw}
   blockdev-add {node-name: acc, driver: snapshot-access, file: cbw}
]

qmp: nbd-server-start {...}
qmp: block-export-add {type: nbd, node-name: acc, bitmaps: [{node: disk0, name: 
bitmap0}]}

3. Now third party backup tool can read data from NBD export

 - Client may negotiate meta contexts, to query exported dirty bitmap by 
NBD_BLOCK_STATUS commend
 - If client reads "not-dirty" (by bitmap0) areas, it gets an error.
 - NBD_CMD_TRIM (discard) works as for full backup, no difference

4. After backup is complete, we should destroy the fleecing scheme:

  - Same as for full backup

5. Next, we should handle dirty bitmaps:

5.1 Failure path

Merge-back bitmap1 to bitmap0 and continue tracking in bitmap0:

qmp: transaction [
    block-dirty-bitmap-enable {node: disk0, name: bitmap0}
    block-dirty-bitmap-merge {node: disk0, target: bitmap0, bitmaps: 
['bitmap1']}
    block-dirty-bitmap-remove {node: disk0, name: bitmap1}
]

5.2 Success path

We have two possible user scenarios on success:

5.2.1 Continue tracking for next incremental backup in bitmap1

In this case, just remove bitmap0:
qmp: block-dirty-bitmap-remove {node: disk0, name: bitmap0}

Or you may not delete bitmap0 and keep it disabled, to be reused in future for 
differential backup (see below).

5.2.2 Continue tracking for next incremental backup in bitmap0 (assume, we 
always work with one bitmap and don't want any kind of differential backups, 
and don't associate bitmap name with stored backups)

In this case, enable and clear bitmap0, merge bitmap1 to bitmap0 and remove 
bitmap1:

qmp: transaction [
    block-dirty-bitmap-enable {node: disk0, name: bitmap0}
    block-dirty-bitmap-clear {node: disk0, name: bitmap0}
    block-dirty-bitmap-merge {node: disk0, target: bitmap0, bitmaps: 
['bitmap1']}
    block-dirty-bitmap-remove {node: disk0, name: bitmap1}
]



== Push backup with fleecing full/incremental ==

Reasoning: the main problem of simple push backup is that guest writes may be seriously 
affected by copy-on-write operations, when backup target is slow. To solve this problem, 
we'll use the scheme like for pull backup: we create local temporary image, which is a 
target for copy-before-write operations, and instead of exporting the 
"snapshot-access" node we start internal backup from it to the target.

So, the scheme and commands looks exactly the same as for full and incremental 
pull backup. The only difference is that we don't need to start nbd export, but 
instead we should add target node to qemu and start internal backup. And good 
thing is that it may be done in same transaction with initializing fleecing 
scheme:

qmp: transaction [
    ... initialize fleecing scheme for full or incremental backup ...

# Add target node. Here is qcow2 added, but it may be nbd node or something else
    blockdev-add {node-name: target-protocol, driver: file, filename: 
target.qcow2}
    blockdev-add {node-name: target, driver: qcow2, file: target-protocol}

# Start backup
    blockdev-backup {device: acc, target: target, ...}
]

If it is an incremental backup, pass also bitmap parameter:

    blockdev-backup {..., bitmap: bitmap0, sync: incremental, bitmap-mode: 
never}

Note bitmap-mode=never: this means that backup will do nothing with bitmap0, so 
we have same scheme like for pull backups (handle bitmaps by hand after 
backup). Still, push-backup scheme may be adopted to use other bitmap modes.

What we lack here is discarding in 'acc' node after successful copying of the 
block to the target, to safe disk space and avoid extra copy-before-write 
operations. It's a TODO, should be implemented like discard-source parameter 
for blockdev-backup.



== Differential backups ==

I'm not fan of this idea, but I think it should be described.

Assume we have already a chain of incremental backups (represented as qcow2 
chain on backup storage server, for example). They corresponds to some points 
in time: T0, T1, T2, T3. Assume T3 is the last backup.

If we want to create usual incremental backup, it would be diff between T3 and 
current time (which becomes T4).

Differential backup say: I want to make backup starting from T1 to current 
time. What's for? Maybe T2 and T3 was removed or somehow damaged..

How to do that in Qemu: on each incremental backup you start a new bitmap, and 
_keep_ old one as disabled.
This way we have bitmap0 (which presents diff between T0 and T1), bitmap1 (diff 
T1 T2), bitmap2 (diff T2 T3), and bitmap3 which shows diff from T3 up to 
current time. bitmap3 is the only enabled bitmap and others are disabled.

So, to make differential backup, use block-dirty-bitmap-merge command, to merge 
all bitmaps you need into one, and than use it in any backup scheme.

The drawback is that all these disabled bitmaps eat RAM. Possible solution is 
to not keep them in RAM, it's OK to keep them in qcow2, and load only on 
demand. That's not realized now and that's a TODO for thous who want 
differential backups.

--
Best regards,
Vladimir
[Prev in Thread]
Current Thread
[Next in Thread]
Summary on new backup interfaces in QEMU, Vladimir Sementsov-Ogievskiy <=
Prev by Date: Re: [PATCH experiment 00/16] C++20 coroutine backend
Next by Date: Re: [PATCH experiment 00/16] C++20 coroutine backend
Previous by thread: [PATCH v2 0/3] Use g_new() & friends where that makes obvious sense
Next by thread: Re: [PULL 0/8] s390x and misc fixes
Index(es):
- Date
- Thread