[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps

From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v2 RFC 0/8] block: persistent dirty bitmaps
Date: Wed, 8 Jul 2015 15:24:44 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

On 24.06.2015 03:21, John Snow wrote:

On 06/17/2015 10:29 AM, Vladimir Sementsov-Ogievskiy wrote:
On 12.06.2015 22:34, John Snow wrote:
On 06/08/2015 11:21 AM, Vladimir Sementsov-Ogievskiy wrote:
   - rebase on my 'Dirty bitmaps migration' series
   - remove 'print dirty bitmap', 'query-dirty-bitmap' and use md5 for
     testing like with dirty bitmaps migration
   - autoclean features


The bitmaps are saved into qcow2 file format. It provides both
'internal' and 'external' dirty bitmaps feature:
   - for qcow2 drives we can store bitmaps in the same file
   - for other formats we can store bitmaps in the separate qcow2 file

QCow2 header is extended by fields 'nb_dirty_bitmaps' and
'dirty_bitmaps_offset' like with snapshots.

Proposed command line syntax is the following:

-dirty-bitmap [option1=val1][,option2=val2]...
      Available options are:
      name         The name for the bitmap (necessary).

      file         The file to load the bitmap from.

      file_id      When specified with 'file' option, then this file will
                   be available through this id for other -dirty-bitmap
                   options when specified without 'file' option, then it
                   is a reference to 'file', specified with another
                   -dirty-bitmap option, and it will be used to load the
                   bitmap from.

      drive        The drive to bind the bitmap to. It should be
                   as 'id' suboption of one of -drive options. If nor
                   'file' neither 'file_id' are specified, then the
                   will be loaded from that drive (internal dirty

      granularity  The granularity for the bitmap. Not necessary, the
                   default value may be used.

      enabled      on|off. Default is 'on'. Disabled bitmaps are not
                   changing regardless of writes to corresponding drive.


qemu -drive file=a.qcow2,id=disk -dirty-bitmap name=b,drive=disk
qemu -drive file=a.raw,id=disk \
       -dirty-bitmap name=b,drive=disk,file=b.qcow2,enabled=off

Vladimir Sementsov-Ogievskiy (8):
    spec: add qcow2-dirty-bitmaps specification
    qcow2: add dirty-bitmaps feature
    block: store persistent dirty bitmaps
    block: add bdrv_load_dirty_bitmap
    qcow2: add qcow2_dirty_bitmap_delete_all
    qcow2: add autoclear bit for dirty bitmaps
    qemu: command line option for dirty bitmaps
    iotests: test internal persistent dirty bitmap

   block.c                       |  82 +++++++
   block/Makefile.objs           |   2 +-
   block/qcow2-dirty-bitmap.c    | 537
   block/qcow2.c                 |  69 +++++-
   block/qcow2.h                 |  61 +++++
   blockdev.c                    |  38 +++
   docs/specs/qcow2.txt          |  66 ++++++
   include/block/block.h         |   9 +
   include/block/block_int.h     |  10 +
   include/sysemu/blockdev.h     |   1 +
   include/sysemu/sysemu.h       |   1 +
   qemu-options.hx               |  37 +++
   tests/qemu-iotests/118        |  83 +++++++
   tests/qemu-iotests/118.out    |   5 +
   tests/qemu-iotests/group      |   1 +
   tests/qemu-iotests/iotests.py |   6 +
   vl.c                          | 100 ++++++++
   17 files changed, 1105 insertions(+), 3 deletions(-)
   create mode 100644 block/qcow2-dirty-bitmap.c
   create mode 100755 tests/qemu-iotests/118
   create mode 100644 tests/qemu-iotests/118.out

Well, you said "RFC" ... So here's some "C" that you RF'd.

Many of these points are a "wish list" of sorts and don't necessarily
have to be implemented all at once, but we should be careful to design
the core series with the later additions in mind.

Many of these items are things that I wouldn't mind working on
(Primarily the QMP interfaces), provided that the core of this series
will allow for them to exist. I can take many of the QMP/transaction
interface projects, for instance.

I'm starting to think we won't be able to squeeze this in for 2.4, but
we can have a bulk of the work well underway for 2.5, by which point I
am hopeful that libvirt will be beginning to pick up motion for
integration of this feature.

I think that the basic approach you have so far is good, we just have to
plan out our required extensions and then we can review the base to make
sure it supports the features we want in the near future.

(1) General storage design

- Persistence bitmaps can be stored in any arbitrary qcow2 file,
regardless of if that qcow2 holds data or not.

- Any given qcow2 file with or without data can hold bitmaps intended
for any number of other drives.
Actually, dirty bitmap is not bound to the image, it just have a name,
identifying it. We can (try to) load any bitmap for any image.

I'm not sure what you mean by "bound" here, but yes, as it stands: the
design is very flexible and I like that. It appears that bitmaps for any
number of images can be simultaneously stored in a .qcow2 as a generic
container, so it's a very flexible approach.

I didn't mean to imply that you couldn't do that already, because it
looks like you can.

- Dirty bitmaps are not assumed to be able to be stored in any
particular location.

So far, this is good. I like the flexibility this provides. This lets us
do all kinds of cool things like store bitmaps for 20 different raw
drives inside of a single 'bitmaps.qcow2' if we wish.

(2) Bitmaps added via QMP do not get any persistence attributes.

This is something we'll need to change. Existing QMP commands that let
us modify bitmaps:

block-dirty-bitmap-add        [+transaction]
block-dirty-bitmap-clear    [+transaction]

- block-dirty-bitmap-add:

We will want the ability for bitmap-add to specify a persistence option.
What I am less clear on is what this attribute should look like.

should we add target: <filename> as an attribute here,
or should it be target: <node> to specify the file object that we want
to store this bitmap in? Or perhaps both?:

mode: file, target: <filename>
mode: node, target: <node>

Or even an explicit usability feature that lets us specify that we wish
to store the bitmap for the drive we're attaching it to:

block-dirty-bitmap=add node=drive0 name=bitmap0 mode=self

The implication here is that the default value for persist could be
"none", which does not attempt to store this bitmap anywhere.

- block-dirty-bitmap-remove

If we remove a bitmap with persistence options active, it needs to be
cleared out of the file it is being stored in. Currently we use
"release" to remove a bitmap, which deletes only the in-memory portion
of the bitmap, so you also use release in your series to delete
in-memory bitmaps after we're done with them.

I think the semantics of the "remove" QMP option here, however, should
include a call to the storage layer to remove the bitmap in question.

Let's split the "release" function into two functions:
(A) bdrv_dirty_bitmap_free (which just frees the in-memory bits)
(B) bdrv_dirty_bitmap_delete (which relies on _free but deletes from
disk also.)

Then bdrv_close can use bitmap_free, but the QMP remove command can
utilize _delete.

- block-dirty-bitmap-clear:

This needs to clear the bitmap on-disk if it has persistence features
Does it? When the bitmap is loaded, its representation on disk is
inconsistent, and an in_use bit is set (on disk). So, we don't need to
sync it here.
Syncing on 'remove' is not necessary for the same reason, but may take
place to not store extra trash..

Keeping an in-use bit is definitely a way to accommodate this QMP
command, so you're right, we don't need to pay special attention here --
unless we go with some kind of a periodic flush model, at which point if
the bitmap is "clean," this command will need to re-mark it as dirty.

Just a generic ->mark_bitmap_dirty() op to call here would suffice entirely.

- block-dirty-bitmap-copy:

This is only a proposal currently, but worth us keeping it in mind. We
should decide on copy semantics. Should the copy keep the persistence
attributes of the source bitmap by default and allow a user to override
it if desired, or should we force the persistence attribute back to
null/None until the user overrides?

I suspect defaulting it to no persistence is probably the sanest until
we're told otherwise (either via an extension to the copy command or a
later edit command.)

Since the QMP interfaces has been my area so far, I can draft their
addition as a new series if you'd like.

(3) Additional QMP interfaces

We should add the ability to modify a bitmap's persistence after it has
been added.

block-dirty-bitmap-edit mode=<file,node,self,none> target=<...>

This will allow us to add persistence to a bitmap after creation, or
remove persistence from a bitmap without deleting it if it's no longer

Perhaps at a later date we could even have it change where the bitmap is
stored through this mechanism.

(Usability features might include the ability for us to rename or change
the granularity of the bitmap, too -- but that's future usability stuff,
not core functionality.)

Like the above, I can draft this addition.
no objections)

(4) Storage Format

I think overall the bitmap extension headers look sane, but Kevin is the
ultimate authority here.

I /would/ like to see an additional header bitfield reserved
for some arbitrary flags that can be used at a later date. A uint32_t
should be sufficient for now, with some of the upper bits reserved
either for an extension or a version field to allow us to expand the
bitmap headers in the future if necessary.

(5) Bitmap autoloading

Bitmaps are not currently automatically loaded if you pass e.g. (-hda
my_drive_that_also_has_bitmaps.qcow2). This is in part because the drive
a bitmap was intended for is not information stored with the bitmap, so
QEMU has no concept or ability to be able to "auto load" bitmaps.

Hinted at earlier by my desire to see something like mode=self, we
should add some flags to the dirty bitmap header stored with each bitmap:

0x01: "This bitmap describes the file it is stored in"
0x02: "This bitmap should be auto-loaded when this file is opened."
0x04: "This bitmap is read-only (disabled.)"
The last one - should it be used only for auto-loading bitmaps?

Not necessarily, I just lumped it in here as an example to be grouped
near the other flags I thought we needed. Maybe we don't actually need a
read only flag to be stored because there's not currently a use-case for
RO bitmaps outside of migration.

Just a passing thought.

This way, with a properly modern version of QEMU, you could simply just:

qemu -M q35 -enable-kvm -hda windows10.qcow2

and if there were bitmaps inside of windows10.qcow2 that had 0x01 and
0x02 set, you'd get those bitmaps loaded before any IO to the data
clusters of the .qcow2, ensuring data integrity.

Of course, I think that it is currently too complicated to try to
accomplish autoloading of bitmaps for *other* drives, so let's not worry
about that now. This means 0x02 set without 0x01 would be an error.

Of course, when autoloading bitmaps, we'll have to check that the size
of the bitmap matches the size of the drive. This is easy to do, though.
it is always checked)

You're right. I think I wasn't convinced we needed size (etc) to be part
of the lookup process, but the way you have it now it does always check
the sizes. Just thinking out loud again.

The 0x01 bit can be set automatically when that circumstance is
detected, and 0x02 can be set perhaps as an option to
--dirty-bitmap auto=yes
or via the QMP
block-dirty-bitmap-add ... auto=yes
or via the edit command,
block-dirty-bitmap-edit ... auto=yes

Maybe we could also set it implicitly if mode=self is used, too.
Also, for auto-loading bitmaps, user can manually load it (changing
'disabled' bit). And in this case auto-loading should be skipped.
Also, if auto-loading is default behavior, than what about
--disable-bitmap-autoloading or something like this?

Agreed. Whatever the default is, we need a way to turn it off and be
explicit about it. Perhaps as an argument to -drive?


-drive if=none,file=linux-and-bitmaps.qcow2,bitmap=<auto,no>

where "no" would be a very explicit "Do not load any of the bitmaps in
this file."

"auto" would load automatically any of the bitmaps stored there with the
auto/self flags set, and skip the rest otherwise.

Maybe this is serviceable.

(6) qemu-img interface

Stefan has mentioned that it would be nice to implement a query ability
to qemu-img to list bitmaps stored in qcow2 files, along with some of
their key attributes. size, granularity, any flags. It's probably not
efficient to list the dirty count, unless we begin storing that
information manually in the header. I don't think there's a strong need
for that level of info, though.

I can handle this part, if you'd like.
can qmp query block with information about bitmaps be reused here?

I don't think so. Here we'd be reading the bitmaps on-disk and reporting
the info stored in-file, instead of the in-memory structures.

(7) CLI interface

- The only way to get a bitmap loaded into memory from file is to use
the --dirty-bitmap argument where you specify the name, file,
destination drive, and granularity.

- The only way to create a new bitmap that will integrate with the
persistence features is to specify a new bitmap that does not currently
exist within a file and allow the qcow2 layer to create the in-memory
bitmap for us.

This helps us with the flexibility that makes this design a winning
choice overall, but it's cumbersome for some special common cases I
think we should be supporting.

As mentioned previously, I think granularity should not be part
of the lookup process -- just creation, and even then I think this CLI
syntax should not automatically create bitmaps if it wasn't found -- if
the user didn't intend to make a bitmap, an error is likely more

Perhaps --dirty-bitmap create=true,[...] would be sufficient for
specifying intent here, at which point granularity makes sense for the
creation process.

As for the granularity, I think this should be appropriate:

--dirty-bitmap file=bitmaps.qcow2,name=bitmap0,drive=drive0

And that should be sufficient to look in bitmaps.qcow2, find 'bitmap0',
and attach it to 'drive0', throwing an error if the sizes don't match.
agree, I will do it

Great! I promise I do like the series overall even if I had a
book-length comment about it :)

(8) Namespaces

Stefan also asked me about the bitmap namespaces -- in-memory of course,
each node can have their own "bitmap0" without any collisions because
all bitmaps are always referred to by their (bs,name) pair.

How do we address bitmaps inside a file, though?

If any given bitmap containing .qcow2 file can store an arbitrary number
of bitmaps intended for an arbitrary number of destinations, how do we
handle this?

-dirty-bitmap name=bitmap0,drive=drive0,file=bitmaps.qcow2
-dirty-bitmap name=bitmap0,drive=drive1,file=bitmaps.qcow2

I think this might currently do very funky things, if bitmaps.qcow2 is
currently empty -- I think both calls will succeed, but it will fail
later when it tries to store them and cannot.

I think we need to do one of two things:

(A) Keep the namespace inside of a .qcow2 file as it is now, but ALWAYS
check up front if a bitmap *can* be added to the file. This way we don't
run into problems after we've dirtied the bitmap.

To clarify, I meant "every bitmap name inside of a file is unique. Check
to make sure it is possible to store a new bitmap upon its creation."

(B) Find a way to accommodate bitmaps with the same names that were
intended for different nodes.

I don't have a good idea for #2, so I think #1 is probably the way to
go. We can amend the bitmap documentation to specify that although the
bitmap names are unique per-node, if you want to store them in the same
file, you're going to want to give them globally unique names.
A: And what about the case: several raw disks and bitmaps.qcow2? In this
case using of namespaces is impossible. Or we are going to have
*-bitmap.qcow2 for each disk..

We could continue letting users do drive0 bitmap0 and drive1 bitmap0,
but as soon as they try to use the QMP commands to store those bitmaps,
the QMP command will report an error if the bitmaps.qcow2 already has a

It effectively uses a per-file namespace for bitmaps and applies that
restriction to any in-memory bitmaps created with persistence flags.

B: As I understand, we have no id or name for the image, it comes from
cmd line.. So we can't use node name as namespace name. Why not just add

-drive file=a.raw,id=disk1,dirty-bitmaps-namespace=disk1_ns \
-drive file=b.raw,id=disk2,dirty-bitmaps-namespace=disk2_ns \
  -dirty-bitmap name=bitmap0,drive=disk1,file=bitmaps.qcow2
  -dirty-bitmap name=bitmap1,drive=disk1,file=bitmaps.qcow2
  -dirty-bitmap name=bitmap0,drive=disk2,file=bitmaps.qcow2

Default namespace: empty string or node name?
Namespace name should be stored in bitmap header for each bitmap.. As
separate field with length field, or may be as bitmap name part
(separated from it by '#' character for example)

I think that's starting to get a little too manual and verbose on the
CLI at this point. Maybe we really should just enforce the first option
and call it a day.

(9) Data consistency

We need to discuss the data safety element to this. I think that
atomically before the first write is flushed to disk, the dirty bitmap
needs to *at least* set a bit in the bitmap header that indicates that
the bitmap is no longer up-to-date.

When the bitmap is later flushed to disk, that bit can be cleared until
the next write occurs, which repeats the process.

We have discussed this (long ago) in the past, but one of the ideas was
to monitor the relative utilization rate of the disk and attempt to
flush the bitmap whenever there was a lull in disk IO, then clear the
"inconsistent" bit.

On close, the flush of data and bitmap both would lead us to clear this
bit as well.

Upon boot, if the inconsistent bit was set, we'd know that the bitmap
was outdated and we'd have to recommend that the bitmap be cleared and a
new bitmap started.

(Or, perhaps, a data-intensive mode where we compare the current data
mode with the most recent incremental backup to re-determine what data
has changed. This would be very, very slow but an option at least for
recovery if started a new full backup is even less desirable.)

Other ideas involve regularly flushing the bitmap at certain timed
intervals, certain usage intervals (e.g. when the changed bitmap data
reaches some total size, like 64KiB of changed bits), or a combination
of regular intervals with "opportunistic" flushing during Disk IO lulls.

This is a key feature that absolutely needs to make it into the base
series, IMO.
I don't understand, what the use of flushing bitmap not only on
disk:close? If there no failures with disk, than bitmap will be flushed
on close and will be consistent for next open(). If there is a disk
crash, even if we flush the bitmap regularly, what is the possibility of
crashing immediately after last flush, before further io-s?

The usage case is QEMU crash, power failure, etc. Not disk crash. If we
periodically flush to HD, we increase the chances that we don't corrupt
our image and bitmap.

If we NEVER flush, we guarantee that any segfault or power outage will
absolutely trash our data.

(10) Storage Efficiency

We should discuss the usage of meta bitmaps or ancillary bitmaps to
record which parts of our bitmap data need to be flushed to disk in
order to reduce flush/close time.

The current meta bitmap implementation optimizes for 1KiB writes to the
network (which fits well under the standard 1500bytes), but perhaps we
could optimize for local storage block size and use this to be stingy
about how much data we decide to write to disk.

I believe this is another feature that should be included in the initial
series as well, because it might radically impact the core design.

Yeah, we might just end up having meta_bitmaps on all the time and rely
on them to know what remains to be written to disk. I think Stefan
wasn't too keen on the idea of a 512GiB disk needing to write a solid
1MiB of data on every close, when in practice we might be able to reduce
it to just a handful of block writes.

(11) Migration

Stefan already touched on this, but we should be mindful of the
different kinds of migration scenarios.

We might migrate the disks, or they might be shared already.

We might migrate (or share) a disk, but what happens if we didn't
migrate or didn't share the bitmap storage file that we were using?

Bitmaps without persistence data will migrate just fine, but how do we
intend to migrate the persistence data itself? I suppose as a first pass
we can just tap into the migration calls and migrate some properties

"This bitmap relies on node_id=xxxx to save its bitmap"

and that should probably work for either kind of storage migration
tactic. The only problem would be nodes without IDs that we opened by
filename ...
It looks like some bitmaps may be migrated automatically (i.e. created
on destination, if they don't exist), but others don't. This means, that
user should describe bitmaps in destination cmd, at least bitmaps,
loaded from file, not node name. And in this case, migration of
persistent bitmap will success if there is a bitmap on destination for
the same node, with the same name and granularity and with set 'file'
field. Otherwise migration fails..

I think I still need to think about this one for a little bit, but I
think there's other work we can do in the meantime at least.

...Another technique would be for any bitmap that is persistent is to
store them all first prior to migration and then allow the destination
to load them anew. This would also work for either shared or migrated
storage if we worked it right.

It seems a little hairy, and I don't have the answers right now...
Something I will ponder on the weekend.

Sorry for long delay with next version. And unfortunately it will be longer, because now I'm very busy with other work and I'll be on vocation from July 15 till August 3.

Best regards,
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]