[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-
From: |
Jeff Cody |
Subject: |
Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command |
Date: |
Mon, 27 Feb 2012 12:02:10 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1 |
On 02/27/2012 10:46 AM, Kevin Wolf wrote:
> Am 26.02.2012 20:31, schrieb Jeff Cody:
>> This is a QAPI/QMP only command to take a snapshot of a group of
>> devices. This is similar to the blockdev-snapshot-sync command, except
>> blockdev-group-snapshot-sync accepts a list devices, filenames, and
>> formats.
>>
>> It is attempted to keep the snapshot of the group atomic; if the
>> creation or open of any of the new snapshots fails, then all of
>> the new snapshots are abandoned, and the name of the snapshot image
>> that failed is returned. The failure case should not interrupt
>> any operations.
>>
>> Rather than use bdrv_close() along with a subsequent bdrv_open() to
>> perform the pivot, the original image is never closed and the new
>> image is placed 'in front' of the original image via manipulation
>> of the BlockDriverState fields. Thus, once the new snapshot image
>> has been successfully created, there are no more failure points
>> before pivoting to the new snapshot.
>>
>> This allows the group of disks to remain consistent with each other,
>> even across snapshot failures.
>>
>> Signed-off-by: Jeff Cody <address@hidden>
>> ---
>> block.c | 47 ++++++++++++++++++++
>> block.h | 1 +
>> blockdev.c | 128
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> qapi-schema.json | 38 ++++++++++++++++
>> 4 files changed, 214 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 3621d11..0045ab1 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -880,6 +880,53 @@ void bdrv_make_anon(BlockDriverState *bs)
>> bs->device_name[0] = '\0';
>> }
>>
>> +/*
>> + * Add new bs contents at the top of an image chain while the chain is live,
>> + * while keeping required fields on the top layer.
>> + *
>> + * It is assumed that bs_new already points to an existing image,
>> + * with the correct backing filename of top->backing_file
>> + */
>> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
>> +{
>> + BlockDriverState tmp;
>> +
>> + /* the new bs must not be in bdrv_states */
>> + bdrv_make_anon(bs_new);
>> +
>> + tmp = *bs_new;
>> + tmp.backing_hd = bs_new;
>> +
>> + /* there are some fields that need to stay on the top layer: */
>> +
>> + /* dev info */
>> + tmp.dev_ops = bs_top->dev_ops;
>> + tmp.dev_opaque = bs_top->dev_opaque;
>> + tmp.dev = bs_top->dev;
>> + tmp.buffer_alignment = bs_top->buffer_alignment;
>> + tmp.copy_on_read = bs_top->copy_on_read;
>> +
>> + /* i/o timing parameters */
>> + tmp.slice_time = bs_top->slice_time;
>> + tmp.slice_start = bs_top->slice_start;
>> + tmp.slice_end = bs_top->slice_end;
>> + tmp.io_limits = bs_top->io_limits;
>> + tmp.io_base = bs_top->io_base;
>> + tmp.throttled_reqs = bs_top->throttled_reqs;
>> + tmp.block_timer = bs_top->block_timer;
>> + tmp.io_limits_enabled = bs_top->io_limits_enabled;
>
> What about iostatus_enabled/iostatus? Possibly on_read/write_error and
> geometry, too. (Maybe the correct answer is that you're doing the right
> thing, but I wanted to bring it up)
I think you are right, those should probably be preserved as well; I
don't know if it hurts to not preserve them, but since these are all set
from drive_init, I think it makes sense I think to keep them on the top
layer.
>
>> +
>> + /* keep the same entry in bdrv_states */
>> + pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name);
>> + tmp.list = bs_top->list;
>> +
>> + /* swap contents of the fixed new bs and the current top */
>> + *bs_new = *bs_top;
>> + *bs_top = tmp;
>> +
>> + bdrv_detach_dev(bs_new, bs_new->dev);
>> +}
>
> The last line would actually deserve a comment /* clear the copied
> fields in the new backing file */, which makes clear that there are
> probably some more fields missing in this section.
OK, added.
>
>> +
>> void bdrv_delete(BlockDriverState *bs)
>> {
>> assert(!bs->dev);
>> diff --git a/block.h b/block.h
>> index cae289b..190a780 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -114,6 +114,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
>> int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
>> BlockDriverState *bdrv_new(const char *device_name);
>> void bdrv_make_anon(BlockDriverState *bs);
>> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
>> void bdrv_delete(BlockDriverState *bs);
>> int bdrv_parse_cache_flags(const char *mode, int *flags);
>> int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
>> diff --git a/blockdev.c b/blockdev.c
>> index 05e7c5e..560f7e8 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -714,6 +714,134 @@ void qmp_blockdev_snapshot_sync(const char *device,
>> const char *snapshot_file,
>> }
>> }
>>
>> +
>> +/* New and old BlockDriverState structs for group snapshots */
>> +typedef struct BlkGroupSnapshotStates {
>> + BlockDriverState *old_bs;
>> + BlockDriverState *new_bs;
>> + bool is_open;
>> + QSIMPLEQ_ENTRY(BlkGroupSnapshotStates) entry;
>> +} BlkGroupSnapshotStates;
>> +
>> +/*
>> + * 'Atomic' group snapshots. The snapshots are taken as a set, and if any
>> fail
>> + * then we do not pivot any of the devices in the group, and abandon the
>> + * snapshots
>> + */
>> +void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list,
>> + Error **errp)
>
> Too much indentation?
Sure :) Fixed.
>
>> +{
>> + int ret = 0;
>> + SnapshotDevList *dev_entry = dev_list;
>> + SnapshotDev *dev_info = NULL;
>> + BlkGroupSnapshotStates *states;
>> + BlockDriver *proto_drv;
>> + BlockDriver *drv;
>> + int flags;
>> + const char *format;
>> + const char *snapshot_file;
>> +
>> + QSIMPLEQ_HEAD(snap_bdrv_states, BlkGroupSnapshotStates)
>> snap_bdrv_states;
>> + QSIMPLEQ_INIT(&snap_bdrv_states);
>> +
>> + /* We don't do anything in this loop that commits us to the snapshot */
>> + while (NULL != dev_entry) {
>> + dev_info = dev_entry->value;
>> + dev_entry = dev_entry->next;
>> +
>> + states = g_malloc0(sizeof(BlkGroupSnapshotStates));
>> + states->is_open = false;
>> +
>> + states->old_bs = bdrv_find(dev_info->device);
>> +
>> + if (!states->old_bs) {
>> + error_set(errp, QERR_DEVICE_NOT_FOUND, dev_info->device);
>> + goto fail;
>> + }
>> +
>> + if (bdrv_in_use(states->old_bs)) {
>> + error_set(errp, QERR_DEVICE_IN_USE, dev_info->device);
>> + goto fail;
>> + }
>> +
>> + snapshot_file = dev_info->snapshot_file;
>> +
>> + flags = states->old_bs->open_flags;
>> +
>> + if (!dev_info->has_format) {
>> + format = "qcow2";
>> + } else {
>> + format = dev_info->format;
>> + }
>> +
>> + drv = bdrv_find_format(format);
>> + if (!drv) {
>> + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
>> + goto fail;
>> + }
>> +
>> + proto_drv = bdrv_find_protocol(snapshot_file);
>> + if (!proto_drv) {
>> + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
>> + goto fail;
>> + }
>> +
>> + /* create new image w/backing file */
>> + ret = bdrv_img_create(snapshot_file, format,
>> + states->old_bs->filename,
>> + drv->format_name, NULL, -1, flags);
>> + if (ret) {
>> + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
>> + goto fail;
>> + }
>> +
>> + /* We will manually add the backing_hd field to the bs later */
>> + states->new_bs = bdrv_new("");
>> + ret = bdrv_open(states->new_bs, snapshot_file,
>> + flags | BDRV_O_NO_BACKING, drv);
>> + states->is_open = true;
>> +
>> + if (ret != 0) {
>> + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file);
>> + goto close_and_fail;
>> + }
>> + QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry);
>
> Inserting it only here means that all the error cases above leak states.
Yes. Will move it up to immediately after the bdrv_open().
>
>> + }
>> +
>> + /*
>> + * Now we will drain, flush, & pivot everything - we are committed at
>> this
>> + * point.
>> + */
>> + bdrv_drain_all();
>
> I would feel more comfortable if we could do the bdrv_drain_all() at the
> very beginning of the function. Not that I know of a specific scenario
> that would go wrong, but you have a nested main loop here that could do
> more or less anything.
I can move it up to the beginning if desired... My thought was that it
was best to drain closer to the point of commit.
>
>> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>> + bdrv_flush(states->old_bs);
>
> This can return an error which must be checked. And of course, we must
> do it before committing to the snapshot (but after bdrv_drain_all).
Hmm. If the flush returns error, do we abandon at this point? Perhaps it
would be best to loop through and flush each device first, and if no
flushes fail, _then_ loop through and perform the bdrv_append(). Once we
are calling bdrv_append(), we want no possible failure points.
>
>> + /* This removes our old bs from the bdrv_states, and adds the new
>> bs */
>> + bdrv_append(states->new_bs, states->old_bs);
>> + }
>> +
>> + /* success */
>> + goto exit;
>> +
>> +close_and_fail:
>> + /*
>> + * failure, and it is all-or-none; abandon each new bs, and keep using
>> + * the original bs for all images
>> + */
>> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>> + if (states->is_open) {
>> + bdrv_close(states->new_bs);
>
> bdrv_delete, as Stefan already mentioned.
>
>> + }
>> + }
>> +fail:
>> +exit:
>> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
>> + g_free(states);
>> + }
>> + return;
>> +}
>
> Kevin
Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command, Kevin Wolf, 2012/02/27
[Qemu-devel] [PATCH v2 2/2] QMP: Add qmp command for blockdev-group-snapshot-sync, Jeff Cody, 2012/02/26