qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v3] doc: Add NBD_CMD_BLOCK_STATUS extension
Date: Mon, 5 Dec 2016 11:36:17 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1

02.12.2016 23:39, John Snow wrote:

On 12/02/2016 01:45 PM, Alex Bligh wrote:
John,

+Some storage formats and operations over such formats express a
+concept of data dirtiness. Whether the operation is block device
+mirroring, incremental block device backup or any other operation with
+a concept of data dirtiness, they all share a need to provide a list
+of ranges that this particular operation treats as dirty.

How can data be 'dirty' if it is static and unchangeable? (I thought)

In a simple case, live IO goes to e.g. hda.qcow2. These writes come from
the VM and cause the bitmap that QEMU manages to become dirty.

We intend to expose the ability to fleece dirty blocks via NBD. What
happens in this scenario would be that a snapshot of the data at the
time of the request is exported over NBD in a read-only manner.

In this way, the drive itself is R/W, but the "view" of it from NBD is
RO. While a hypothetical backup client is busy copying data out of this
temporary view, new writes are coming in to the drive, but are not being
exposed through the NBD export.

(This goes into QEMU-specifics, but those new writes are dirtying a
version of the bitmap not intended to be exposed via the NBD channel.
NBD gets effectively a snapshot of both the bitmap AND the data.)
Thanks. That makes sense - or enough sense for me to carry on commenting!

Whew! I'm glad.

I now think what you are talking about backing up a *snapshot* of a disk
that's running, where the disk itself was not connected using NBD? IE it's
not being 'made dirty' by NBD_CMD_WRITE etc. Rather 'dirtiness' is effectively
an opaque state represented in a bitmap, which is binary metadata
at some particular level of granularity. It might as well be 'happiness'
or 'is coloured blue'. The NBD server would (normally) have no way of
manipulating this bitmap.

In previous comments, I said 'how come we can set the dirty bit through
writes but can't clear it?'. This (my statement) is now I think wrong,
as NBD_CMD_WRITE etc. is not defined to set the dirty bit. The
state of the bitmap comes from whatever sets the bitmap which is outside
the scope of this protocol to transmit it.

You know, this is a fair point. We have not (to my knowledge) yet
carefully considered the exact bitmap management scenario when NBD is
involved in retrieving dirty blocks.

Humor me for a moment while I talk about a (completely hypothetical, not
yet fully discussed) workflow for how I envision this feature.

(1) User sets up a drive in QEMU, a bitmap is initialized, an initial
backup is made, etc.

(2) As writes come in, QEMU's bitmap is dirtied.

(3) The user decides they want to root around to see what data has
changed and would like to use NBD to do so, in contrast to QEMU's own
facilities for dumping dirty blocks.

(4) A command is issued that creates a temporary, lightweight snapshot
('fleecing') and exports this snapshot over NBD. The bitmap is
associated with the NBD export at this point at NBD server startup. (For
the sake of QEMU discussion, maybe this command is "blockdev-fleece")

(5) At this moment, the snapshot is static and represents the data at
the time the NBD server was started. The bitmap is also forked and
represents only this snapshot. The live data and bitmap continue to change.

(6) Dirty blocks are queried and copied out via NBD.

(7) The user closes the NBD instance upon completion of their task,
whatever it was. (Making a new incremental backup? Just taking a peek at
some changed data? who knows.)

The point that's interesting here is what do we do with the two bitmaps
at this point? The data delta can be discarded (this was after all just
a lightweight read-only point-in-time snapshot) but the bitmap data
needs to be dealt with.

(A) In the case of "User made a new incremental backup," the bitmap that
got forked off to serve the NBD read should be discarded.

(B) In the case of "User just wanted to look around," the bitmap should
be merged back into the bitmap it was forked from.

I don't advise a hybrid where "User copied some data, but not all" where
we need to partially clear *and* merge, but conceivably this could
happen, because the things we don't want to happen always will.

At this point maybe it's becoming obvious that actually it would be very
prudent to allow the NBD client itself to inform QEMU via the NBD
protocol which extents/blocks/(etc) that it is "done" with.

Maybe it *would* actually be useful if, in NBD allowing us to add a
"dirty" bit to the specification, we allow users to clear those bits.

Then, whether the user was trying to do (A) or (B) or the unspeakable
amalgamation of both things, it's up to the user to clear the bits
desired and QEMU can do the simple task of simply always merging the
bitmap fork upon the conclusion of the NBD fleecing exercise.

Maybe this would allow the dirty bit to have a bit more concrete meaning
for the NBD spec: "The bit stays dirty until the user clears it, and is
set when the matching block/extent/etc is written to."

With an exception that external management may cause the bits to clear.
(I.e., someone fiddles with the backing store in a way opaque to NBD,
e.g. someone clears the bitmap directly through QEMU instead of via NBD.)
There is currently one possible "I've done with the entire bitmap"
signal, which is closing the connection. This has two obvious
problems. Firstly if used, it discards the entire bitmap (not bits).
Secondly, it makes recovery from a broken TCP session difficult
(as either you treat a dirty close as meaning the bitmap needs
to hang around, in which case you have a garbage collection issue,
or you treat it as needing to drop the bitmap, in which case you
can't recover).

In my mind, I wasn't treating closing the connection as the end of the
point-in-time snapshot; that would be stopping the export.

I wouldn't advocate for a control channel (QEMU, here) clearing the
bitmap just because a client disappeared.

Either:

(A) QEMU clears the bitmap because the NBD export was *stopped*, or
(B) QEMU, acting as the NBD server, clears the bitmap as instructed by
the NBD client, if we admit a provision to clear bits from the NBD
protocol itself.

I don't think there's room for the NBD server (QEMU) deciding to clear
bits based on connection status. It has to be an explicit decision --
either via NBD or QMP.

I think in your plan the block status doesn't change once the bitmap
is forked. In that case, adding some command (optional) to change
the status of the bitmap (or simply to set a given extent to status X)
would be reasonable. Of course whether it's supported could be dependent
on the bitmap.

What I describe as "forking" was kind of a bad description. What really
happens when we have a divergence is that the bitmap with data is split
into two bitmaps that are related:

- A new bitmap that is created takes over for the old bitmap. This new
bitmap is empty. It records writes on the live version of the data.
- The old bitmap as it existed remains in a read-only state, and
describes some point-in-time snapshot view of the data.

In the case of an incremental backup, once we've made a backup of the
data, that read-only bitmap can actually be discarded without further
thought.

In the case of a failed incremental backup, or in the case of "I just
wanted to look and see what has changed, but wasn't prepared to reset
the counter yet," this bitmap gets merged back with the live bitmap as
if nothing ever happened.

ANYWAY, allowing the NBD client to request bits be cleared has an
obvious use case even for QEMU, IMO -- which is, the NBD client itself
gains the ability to, without relying on a control plane to the server,
decide for itself if it is going to "make a backup" or "just look around."

The client gains the ability to leave the bitmap alone (QEMU will
re-merge it later once the snapshot is closed) or the ability to clear
it ("I made my backup, we're done with this.")

That usefulness would allow us to have an explicit dirty bit mechanism
directly in NBD, IMO, because:

(1) A RW NBD server has enough information to mark a bit dirty
(2) Since there exists an in-spec mechanism to reset the bitmap, the
dirty bit is meaningful to the server and the client

Having missed most of the discussion on v1/v2, is it a given that we
want in-band identification of bitmaps?

I guess this might depend very heavily on the nature of the definition
of the "dirty bit" in the NBD spec.
I don't think it's a given. I think Wouter & I came up with it at
the same time as a way to abstract the bitmap/extent concept and
remove the need to specify a dirty bit at all (well, that's my excuse
anyway).

OK. We do certainly support multiple bitmaps being active at a time in
QEMU, but I had personally always envisioned that you'd associate them
one-at-a-time when starting the NBD export of a particular device.

I don't have a use case in my head where two distinct bitmaps being
exposed simultaneously offer any particular benefit, but maybe there is
something. I'm sure there is.

I will leave this aspect of it more to you NBD folks. I think QEMU could
cope with either.

(Vladimir, am I wrong? Do you have thoughts on this in particular? I
haven't thought through this aspect of it very much.)

I'm ok with either too.

Yes, with online external backup (fleecing), the bitmap is already selected in Qemu. And this is most interesting case, anyway.

For offline external backup, when we have RO disk, it may have several bitmaps, for example for different backup frequency and it may be not bad for client to have an ability to chose.. But again, we can select exporting bitmap through qmp (even same fleecing scheme will be ok, with overhead of creating empty delta)


Anyway, I hope I am being useful and just not more confounding. It seems
to me that we're having difficulty conveying precisely what it is we're
trying to accomplish, so I hope that I am making a good effort in
elaborating on our goals/requirements.
Yes absolutely. I think part of the challenge is that you are quite
reasonably coming at it from the point of view of qemu's particular
need, and I'm coming at it from 'what should the nbd protocol look
like in general' position, having done lots of work on the protocol
docs (though I'm an occasional qemu contributor). So there's necessarily
a gap of approach to be bridged.

Yeah, I understand quite well that we need to make sure the NBD spec is
sane and useful in a QEMU-agnostic way, so my goal here is just to help
elucidate our needs to enable you to reach a good consensus.

I'm overdue on a review of Wouter's latest patch (partly because I need
to re-diff it against the version with no NBD_CMD_BLOCK_STATUS in),
but I think it's a bridge worth building.

Same. Thank you for your patience!

Cheers,
--js


--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]