Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup

From:	Max Reitz
Subject:	Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup
Date:	Fri, 17 Aug 2018 23:50:50 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 2018-08-14 19:01, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
> ---
> 
> [v2 is just a resend. I forget to add Den an me to cc, and I don't see the
> letter in my thunderbird at all. strange. sorry for that]
> 
> Hi all!
> 
> Here is an idea and kind of proof-of-concept of how to unify and improve
> push/pull backup schemes.
> 
> Let's start from fleecing, a way of importing a point-in-time snapshot not
> creating a real snapshot. Now we do it with help of backup(sync=none)..
> 
> Proposal:
> 
> For fleecing we need two nodes:
> 
> 1. fleecing hook. It's a filter which should be inserted on top of active
> disk. It's main purpose is handling guest writes by copy-on-write operation,
> i.e. it's a substitution for write-notifier in backup job.
> 
> 2. fleecing cache. It's a target node for COW operations by fleecing-hook.
> It also represents a point-in-time snapshot of active disk for the readers.

It's not really COW, it's copy-before-write, isn't it?  It's something
else entirely.  COW is about writing data to an overlay *instead* of
writing it to the backing file.  Ideally, you don't copy anything,
actually.  It's just a side effect that you need to copy things if your
cluster size doesn't happen to match exactly what you're overwriting.

CBW is about copying everything to the overlay, and then leaving it
alone, instead writing the data to the backing file.

I'm not sure how important it is, I just wanted to make a note so we
don't misunderstand what's going on, somehow.

The fleecing hook sounds good to me, but I'm asking myself why we don't
just add that behavior to the backup filter node.  That is, re-implement
backup without before-write notifiers by making the filter node actually
do something (I think there was some reason, but I don't remember).

> The simplest realization of fleecing cache is a qcow2 temporary image, backed
> by active disk, i.e.:
> 
>     +-------+
>     | Guest |
>     +---+---+
>         |
>         v
>     +---+-----------+  file     +-----------------------+
>     | Fleecing hook +---------->+ Fleecing cache(qcow2) |
>     +---+-----------+           +---+-------------------+
>         |                           |
> backing |                           |
>         v                           |
>     +---+---------+      backing    |
>     | Active disk +<----------------+
>     +-------------+
> 
> Hm. No, because of permissions I can't do so, I have to do like this:
> 
>     +-------+
>     | Guest |
>     +---+---+
>         |
>         v
>     +---+-----------+  file     +-----------------------+
>     | Fleecing hook +---------->+ Fleecing cache(qcow2) |
>     +---+-----------+           +-----+-----------------+
>         |                             |
> backing |                             | backing
>         v                             v
>     +---+---------+   backing   +-----+---------------------+
>     | Active disk +<------------+ hack children permissions |
>     +-------------+             |     filter node           |
>                                 +---------------------------+
> 
> Ok, this works, it's an image fleecing scheme without any block jobs.

So this is the goal?  Hm.  How useful is that really?

I suppose technically you could allow blockdev-add'ing a backup filter
node (though only with sync=none) and that would give you the same.

> Problems with realization:
> 
> 1 What to do with hack-permissions-node? What is a true way to implement
> something like this? How to tune permissions to avoid this additional node?

Hm, how is that different from what we currently do?  Because the block
job takes care of it?

Well, the user would have to guarantee the permissions.  And they can
only do that by manually adding a filter node in the backing chain, I
suppose.

Or they just start a block job which guarantees the permissions work...
So maybe it's best to just stay with a block job as it is.

> 2 Inserting/removing the filter. Do we have working way or developments on
> it?

Berto has posted patches for an x-blockdev-reopen QMP command.

> 3. Interesting: we can't setup backing link to active disk before inserting
> fleecing-hook, otherwise, it will damage this link on insertion. This means,
> that we can't create fleecing cache node in advance with all backing to
> reference it when creating fleecing hook. And we can't prepare all the nodes
> in advance and then insert the filter.. We have to:
> 1. create all the nodes with all links in one big json, or

I think that should be possible with x-blockdev-reopen.

> 2. set backing links/create nodes automatically, as it is done in this RFC
>  (it's a bad way I think, not clear, not transparent)
> 
> 4. Is it a good idea to use "backing" and "file" links in such way?

I don't think so, because you're pretending it to be a COW relationship
when it isn't.  Using backing for what it is is kind of OK (because
that's what the mirror and backup filters do, too), but then using
"file" additionally is a bit weird.

(Usually, "backing" refers to a filtered node with COW, and "file" then
refers to the node where the overlay driver stores its data and
metadata.  But you'd store old data there (instead of new data), and no
metadata.)

> Benefits, or, what can be done:
> 
> 1. We can implement special Fleecing cache filter driver, which will be a real
> cache: it will store some recently written clusters and RAM, it can have a
> backing (or file?) qcow2 child, to flush some clusters to the disk, etc. So,
> for each cluster of active disk we will have the following characteristics:
> 
> - changed (changed in active disk since backup start)
> - copy (we need this cluster for fleecing user. For example, in RFC patch all
> clusters are "copy", cow_bitmap is initialized to all ones. We can use some
> existent bitmap to initialize cow_bitmap, and it will provide an "incremental"
> fleecing (for use in incremental backup push or pull)
> - cached in RAM
> - cached in disk

Would it be possible to implement such a filter driver that could just
be used as a backup target?

> On top of these characteristics we can implement the following features:
> 
> 1. COR, we can cache clusters not only on writes but on reads too, if we have
> free space in ram-cache (and if not, do not cache at all, don't write to
> disk-cache). It may be done like bdrv_write(..., BDRV_REQ_UNNECESARY)

You can do the same with backup by just putting a fast overlay between
source and the backup, if your source is so slow, and then do COR, i.e.:

slow source --> fast overlay --> COR node --> backup filter

> 2. Benefit for guest: if cluster is unchanged and ram-cached, we can skip 
> reading
> from the devise
> 
> 3. If needed, we can drop unchanged ram-cached clusters from ram-cache
> 
> 4. On guest write, if cluster is already cached, we just mark it "changed"
> 
> 5. Lazy discards: in some setups, discards are not guaranteed to do something,
> so, we can at least defer some discards to the end of backup, if ram-cache is
> full.
> 
> 6. We can implement discard operation in fleecing cache, to make cluster
> not needed (drop from cache, drop "copy" flag), so further reads of this
> cluster will return error. So, fleecing client may read cluster by cluster
> and discard them to reduce COW-load of the drive. We even can combine read
> and discard into one command, something like "read-once", or it may be a
> flag for fleecing-cache, that all reads are "read-once".

That would definitely be possible with a dedicated fleecing backup
target filter (and normal backup).

> 7. We can provide recommendations, on which clusters should fleecing-client
> copy first. Examples:
> a. copy ram-cached clusters first (obvious, to unload cache and reduce io
>    overhead)
> b. copy zero-clusters last (the don't occupy place in cache, so, lets copy
>    other clusters first)
> c. copy disk-cached clusters list (if we don't care about disk space,
>    we can say, that for disk-cached clusters we already have a maximum
>    io overhead, so let's copy other clusters first)
> d. copy disk-cached clusters with high priority (but after ram-cached) -
>    if we don't have enough disk space
> 
> So, there is a wide range of possible politics. How to provide these
> recommendations?
> 1. block_status
> 2. create separate interface
> 3. internal backup job may access shared fleecing object directly.

Hm, this is a completely different question now.  Sure, extending backup
or mirror (or a future blockdev-copy) would make it easiest for us.  But
then again, if you want to copy data off a point-in-time snapshot of a
volume, you can just use normal backup anyway, right?

So I'd say the purpose of fleecing is that you have an external tool
make use of it.  Since my impression was that you'd just access the
volume externally and wouldn't actually copy all of the data off of it
(because that's what you could use the backup job for), I don't think I
can say much here, because my impression seems to have been wrong.

> About internal backup:
> Of course, we need a job which will copy clusters. But it will be simplified:

So you want to completely rebuild backup based on the fact that you
specifically have fleecing now?

I don't think that will be any simpler.

I mean, it would make blockdev-copy simpler, because we could
immediately replace backup by mirror, and then we just have mirror,
which would then automatically become blockdev-copy...

But it's not really going to be simpler, because whether you put the
copy-before-write logic into a dedicated block driver, or into the
backup filter driver, doesn't really make it simpler either way.  Well,
adding a new driver always is a bit more complicated, so there's that.

> it should not care about guest writes, it copies clusters from a kind of
> snapshot which is not changing in time. This job should follow recommendations
> from fleecing scheme [7].
> 
> What about the target?
> 
> We can use separate node as target, and copy from fleecing cache to the 
> target.
> If we have only ram-cache, it would be equal to current approach (data is 
> copied
> directly to the target, even on COW). If we have both ram- and disk- caches, 
> it's
> a cool solution for slow-target: instead of make guest wait for long write to
> backup target (when ram-cache is full) we can write to disk-cache which is 
> local
> and fast.

Or you backup to a fast overlay over a slow target, and run a live
commit on the side.

> Another option is to combine fleecing cache and target somehow (I didn't think
> about this really).
> 
> Finally, with one - two (three?) special filters we can implement all current
> fleecing/backup schemes in unique and very configurable way  and do a lot more
> cool features and possibilities.
> 
> What do you think?

I think adding a specific fleecing target filter makes sense because you
gave many reasons for interesting new use cases that could emerge from that.

But I think adding a new fleecing-hook driver just means moving the
implementation from backup to that new driver.

Max

> I really need help with fleecing graph creating/inserting/destroying, my code
> about it is a hack, I don't like it, it just works.
> 
> About testing: to show that this work I use existing fleecing test - 222, a 
> bit
> tuned (drop block-job and use new qmp command to remove filter).

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-block] [RFC v2] new, node-graph-based fleecing and backup, Vladimir Sementsov-Ogievskiy, 2018/08/14
- Re: [Qemu-block] [Qemu-devel] [RFC v2] new, node-graph-based fleecing and backup, no-reply, 2018/08/16
- Re: [Qemu-block] [Qemu-devel] [RFC v2] new, node-graph-based fleecing and backup, no-reply, 2018/08/16
  - Re: [Qemu-block] [Qemu-devel] [RFC v2] new, node-graph-based fleecing and backup, Vladimir Sementsov-Ogievskiy, 2018/08/16
    - Re: [Qemu-block] [Qemu-devel] [RFC v2] new, node-graph-based fleecing and backup, Eric Blake, 2018/08/16
- Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup, Vladimir Sementsov-Ogievskiy, 2018/08/17
- Re: [Qemu-block] [Qemu-devel] [RFC v2] new, node-graph-based fleecing and backup, no-reply, 2018/08/17
- Re: [Qemu-block] [Qemu-devel] [RFC v2] new, node-graph-based fleecing and backup, no-reply, 2018/08/17
- Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup, Max Reitz <=
  - Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup, Vladimir Sementsov-Ogievskiy, 2018/08/20
    - Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup, Max Reitz, 2018/08/20
    - Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup, Vladimir Sementsov-Ogievskiy, 2018/08/20
    - Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup, Max Reitz, 2018/08/20
    - Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup, Vladimir Sementsov-Ogievskiy, 2018/08/20
    - Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup, Vladimir Sementsov-Ogievskiy, 2018/08/21

Prev by Date: Re: [Qemu-block] [PATCH v2 1/3] qapi: add x-query-block-graph
Next by Date: Re: [Qemu-block] [Qemu-devel] [PATCH v3 0/3] file-posix: Simplifications on image locking
Previous by thread: Re: [Qemu-block] [Qemu-devel] [RFC v2] new, node-graph-based fleecing and backup
Next by thread: Re: [Qemu-block] [RFC v2] new, node-graph-based fleecing and backup
Index(es):
- Date
- Thread