qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V6 1/3] Implement sync modes for drive-backup.


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH V6 1/3] Implement sync modes for drive-backup.
Date: Wed, 24 Jul 2013 12:55:43 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 23.07.2013 um 00:09 hat Ian Main geschrieben:
> This patch adds sync-modes to the drive-backup interface and
> implements the FULL, NONE and TOP modes of synchronization.
> 
> FULL performs as before copying the entire contents of the drive
> while preserving the point-in-time using CoW.
> NONE only copies new writes to the target drive.
> TOP copies changes to the topmost drive image and preserves the
> point-in-time using CoW.
> 
> For sync mode TOP are creating a new target image using the same backing
> file as the original disk image.  Then any new data that has been laid
> on top of it since creation is copied in the main backup_run() loop.
> There is an extra check in the 'TOP' case so that we don't bother to copy
> all the data of the backing file as it already exists in the target.
> This is where the bdrv_co_is_allocated() is used to determine if the
> data exists in the topmost layer or below.
> 
> Also any new data being written is intercepted via the write_notifier
> hook which ends up calling backup_do_cow() to copy old data out before
> it gets overwritten.
> 
> For mode 'NONE' we create the new target image and only copy in the
> original data from the disk image starting from the time the call was
> made.  This preserves the point in time data by only copying the parts
> that are *going to change* to the target image.  This way we can
> reconstruct the final image by checking to see if the given block exists
> in the new target image first, and if it does not, you can get it from
> the original image.  This is basically an optimization allowing you to
> do point-in-time snapshots with low overhead vs the 'FULL' version.
> 
> Since there is no old data to copy out the loop in backup_run() for the
> NONE case just calls qemu_coroutine_yield() which only wakes up after
> an event (usually cancel in this case).  The rest is handled by the
> before_write notifier which again calls backup_do_cow() to write out
> the old data so it can be preserved.
> 
> Signed-off-by: Ian Main <address@hidden>
> ---
>  block/backup.c            | 91 
> +++++++++++++++++++++++++++++++----------------
>  blockdev.c                | 36 ++++++++++++-------
>  include/block/block_int.h |  4 ++-
>  qapi-schema.json          |  4 +--
>  qmp-commands.hx           |  2 ++
>  5 files changed, 92 insertions(+), 45 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 16105d4..68abd23 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -37,6 +37,7 @@ typedef struct CowRequest {
>  typedef struct BackupBlockJob {
>      BlockJob common;
>      BlockDriverState *target;
> +    MirrorSyncMode sync_mode;
>      RateLimit limit;
>      BlockdevOnError on_source_error;
>      BlockdevOnError on_target_error;
> @@ -247,40 +248,69 @@ static void coroutine_fn backup_run(void *opaque)
>  
>      bdrv_add_before_write_notifier(bs, &before_write);
>  
> -    for (; start < end; start++) {
> -        bool error_is_read;
> -
> -        if (block_job_is_cancelled(&job->common)) {
> -            break;
> +    if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
> +        while (!block_job_is_cancelled(&job->common)) {
> +            /* Yield until the job is cancelled.  We just let our 
> before_write
> +             * notify callback service CoW requests. */
> +            job->common.busy = false;
> +            qemu_coroutine_yield();
> +            job->common.busy = true;
>          }
> +    } else {
> +        /* Both FULL and TOP SYNC_MODE's require copying.. */
> +        for (; start < end; start++) {
> +            bool error_is_read;
>  
> -        /* we need to yield so that qemu_aio_flush() returns.
> -         * (without, VM does not reboot)
> -         */
> -        if (job->common.speed) {
> -            uint64_t delay_ns = ratelimit_calculate_delay(
> -                &job->limit, job->sectors_read);
> -            job->sectors_read = 0;
> -            block_job_sleep_ns(&job->common, rt_clock, delay_ns);
> -        } else {
> -            block_job_sleep_ns(&job->common, rt_clock, 0);
> -        }
> +            if (block_job_is_cancelled(&job->common)) {
> +                break;
> +            }
>  
> -        if (block_job_is_cancelled(&job->common)) {
> -            break;
> -        }
> +            /* we need to yield so that qemu_aio_flush() returns.
> +             * (without, VM does not reboot)
> +             */
> +            if (job->common.speed) {
> +                uint64_t delay_ns = ratelimit_calculate_delay(
> +                        &job->limit, job->sectors_read);
> +                job->sectors_read = 0;
> +                block_job_sleep_ns(&job->common, rt_clock, delay_ns);
> +            } else {
> +                block_job_sleep_ns(&job->common, rt_clock, 0);
> +            }
>  
> -        ret = backup_do_cow(bs, start * BACKUP_SECTORS_PER_CLUSTER,
> -                            BACKUP_SECTORS_PER_CLUSTER, &error_is_read);
> -        if (ret < 0) {
> -            /* Depending on error action, fail now or retry cluster */
> -            BlockErrorAction action =
> -                backup_error_action(job, error_is_read, -ret);
> -            if (action == BDRV_ACTION_REPORT) {
> +            if (block_job_is_cancelled(&job->common)) {
>                  break;
> -            } else {
> -                start--;
> -                continue;
> +            }
> +
> +            if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
> +                int n, alloced;
> +
> +                /* Check to see if these blocks are already in the
> +                 * backing file. */
> +
> +                alloced =
> +                    bdrv_co_is_allocated(bs, start * 
> BACKUP_SECTORS_PER_CLUSTER,
> +                                         BACKUP_SECTORS_PER_CLUSTER, &n);
> +                /* The above call returns true if the given sector is in the
> +                 * topmost image.  If that is the case then we must copy it 
> as
> +                 * it has been modified from the original backing file.
> +                 * Otherwise we skip it. */
> +                if (alloced == 0) {
> +                    continue;

At first I was going to suggest that you use n for optimising the case
where you can skip multiple clusters.

But I think we have a much more serious problem: We're working at
BACKUP_SECTORS_PER_CLUSTER granularity, which currently consists of 128
512-byte sectors, each of which can have a different allocation status.

If the first one is unallocated, you skip the other 127 sectors as well,
without ever checking if they need to be copied. In the same way, the
code below can copy sectors which shouldn't be copied in fact. (When
fixing this, be aware that backup_do_cow() does round its parameters
once more.)

> +                }
> +            }
> +            /* FULL sync mode we copy the whole drive. */
> +            ret = backup_do_cow(bs, start * BACKUP_SECTORS_PER_CLUSTER,
> +                    BACKUP_SECTORS_PER_CLUSTER, &error_is_read);
> +            if (ret < 0) {
> +                /* Depending on error action, fail now or retry cluster */
> +                BlockErrorAction action =
> +                    backup_error_action(job, error_is_read, -ret);
> +                if (action == BDRV_ACTION_REPORT) {
> +                    break;
> +                } else {
> +                    start--;
> +                    continue;
> +                }
>              }
>          }
>      }
> @@ -300,7 +330,7 @@ static void coroutine_fn backup_run(void *opaque)
>  }
>  
>  void backup_start(BlockDriverState *bs, BlockDriverState *target,
> -                  int64_t speed,
> +                  int64_t speed, MirrorSyncMode sync_mode,
>                    BlockdevOnError on_source_error,
>                    BlockdevOnError on_target_error,
>                    BlockDriverCompletionFunc *cb, void *opaque,
> @@ -335,6 +365,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState 
> *target,
>      job->on_source_error = on_source_error;
>      job->on_target_error = on_target_error;
>      job->target = target;
> +    job->sync_mode = sync_mode;
>      job->common.len = len;
>      job->common.co = qemu_coroutine_create(backup_run);
>      qemu_coroutine_enter(job->common.co, job);
> diff --git a/blockdev.c b/blockdev.c
> index c5abd65..482029e 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1430,17 +1430,13 @@ void qmp_drive_backup(const char *device, const char 
> *target,
>                        Error **errp)
>  {
>      BlockDriverState *bs;
> -    BlockDriverState *target_bs;
> +    BlockDriverState *target_bs, *source;
>      BlockDriver *drv = NULL;
>      Error *local_err = NULL;
>      int flags;
>      int64_t size;
>      int ret;
>  
> -    if (sync != MIRROR_SYNC_MODE_FULL) {
> -        error_setg(errp, "only sync mode 'full' is currently supported");
> -        return;
> -    }
>      if (!has_speed) {
>          speed = 0;
>      }
> @@ -1465,10 +1461,13 @@ void qmp_drive_backup(const char *device, const char 
> *target,
>          return;
>      }
>  
> -    if (!has_format) {
> -        format = mode == NEW_IMAGE_MODE_EXISTING ? NULL : 
> bs->drv->format_name;
> -    }
> -    if (format) {
> +    if (mode == NEW_IMAGE_MODE_EXISTING) {
> +        format = bs->drv->format_name;
> +    } else {
> +        if (!has_format) {
> +            error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);
> +            return;
> +        }
>          drv = bdrv_find_format(format);
>          if (!drv) {
>              error_set(errp, QERR_INVALID_BLOCK_FORMAT, format);

Is this part really related to the new sync modes or should it be a
separate patch?

Unconditionally overriding format for NEW_IMAGE_MODE_EXISTING is
definitely wrong. It's the user's choice which COW format to use for the
backup image. There's no reason why it has to be the same format as the
image that is being backed up.

Before, bs->drv->format_name was a default for the case where a new
image had to be created and no format was given; and the format of
existing images could be probed. This is still what makes most sense to
me. What's even the goal with this change?

> @@ -1483,6 +1482,13 @@ void qmp_drive_backup(const char *device, const char 
> *target,
>  
>      flags = bs->open_flags | BDRV_O_RDWR;
>  
> +    /* See if we have a backing HD we can use to create our new image
> +     * on top of. */
> +    source = bs->backing_hd;
> +    if (!source && sync == MIRROR_SYNC_MODE_TOP) {
> +        sync = MIRROR_SYNC_MODE_FULL;
> +    }
> +
>      size = bdrv_getlength(bs);
>      if (size < 0) {
>          error_setg_errno(errp, -size, "bdrv_getlength failed");
> @@ -1491,8 +1497,14 @@ void qmp_drive_backup(const char *device, const char 
> *target,
>  
>      if (mode != NEW_IMAGE_MODE_EXISTING) {
>          assert(format && drv);
> -        bdrv_img_create(target, format,
> -                        NULL, NULL, NULL, size, flags, &local_err, false);
> +        if (sync == MIRROR_SYNC_MODE_TOP) {
> +            bdrv_img_create(target, format, source->filename,
> +                            source->drv->format_name, NULL,
> +                            size, flags, &local_err, false);
> +        } else {
> +            bdrv_img_create(target, format, NULL, NULL, NULL,
> +                            size, flags, &local_err, false);
> +        }
>      }
>  
>      if (error_is_set(&local_err)) {
> @@ -1508,7 +1520,7 @@ void qmp_drive_backup(const char *device, const char 
> *target,
>          return;
>      }
>  
> -    backup_start(bs, target_bs, speed, on_source_error, on_target_error,
> +    backup_start(bs, target_bs, speed, sync, on_source_error, 
> on_target_error,
>                   block_job_cb, bs, &local_err);
>      if (local_err != NULL) {
>          bdrv_delete(target_bs);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index c6ac871..e45f2a0 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -404,6 +404,7 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
> *target,
>   * @bs: Block device to operate on.
>   * @target: Block device to write to.
>   * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
> + * @sync_mode: What parts of the disk image should be copied to the 
> destination.
>   * @on_source_error: The action to take upon error reading from the source.
>   * @on_target_error: The action to take upon error writing to the target.
>   * @cb: Completion function for the job.
> @@ -413,7 +414,8 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
> *target,
>   * until the job is cancelled or manually completed.
>   */
>  void backup_start(BlockDriverState *bs, BlockDriverState *target,
> -                  int64_t speed, BlockdevOnError on_source_error,
> +                  int64_t speed, MirrorSyncMode sync_mode,
> +                  BlockdevOnError on_source_error,
>                    BlockdevOnError on_target_error,
>                    BlockDriverCompletionFunc *cb, void *opaque,
>                    Error **errp);
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 7b9fef1..8a9bcc5 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1816,8 +1816,8 @@
>  #          is a device, the existing file/device will be used as the new
>  #          destination.  If it does not exist, a new file will be created.
>  #
> -# @format: #optional the format of the new destination, default is to
> -#          probe if @mode is 'existing', else the format of the source
> +# @format: #optional If @mode is 'existing' the format will be probed.
> +#          If the format requires a new file then format is required.

This modifies drive-mirror rather than drive-backup.

Also, the last sentence doesn't make sense. Is s/format/mode/ what you
meant?

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]