qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Block job commands in QEMU 1.2 [v2, including support for r


From: Paolo Bonzini
Subject: [Qemu-devel] Block job commands in QEMU 1.2 [v2, including support for replication]
Date: Thu, 24 May 2012 15:41:29 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

changes from v1:
- added per-job iostatus
- added description of persistent dirty bitmap

The same content is also at
http://wiki.qemu.org/Features/LiveBlockMigration/1.2


QMP changes for error handling
==============================

* query-block-jobs: BlockJobInfo gets two new fields, paused and
io-status.  The job-specific iostatus is completely separate from the
block device iostatus.


* block-stream: I would still like to add on_error to the existing
block-stream command, if only to ease unit testing.  Concerns about the
stability of the API can be handled by adding introspection (exporting
the schema), which is not hard to do.  The new option is an enum with
the following possible values:

'report': The behavior is the same as in 1.1.  An I/O error will
complete the job immediately with an error code.

'ignore': An I/O error, respectively during a read or a write, will be
ignored.  For streaming, the job will complete with an error and the
backing file will be left in place.  For mirroring, the sector will be
marked again as dirty and re-examined later.

'stop': The job will be paused, and the job iostatus (which can be
examined with query-block-jobs) is updated.

'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others.

In all cases, even for 'report', the I/O error is reported as a QMP
event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR.

After cancelling a job, the job implementation MAY choose to treat stop
and enospc values as report, i.e. complete the job immediately with an
error code, as long as block_job_is_cancelled(job) returns true when the
completion callback is called.

  Open problem: There could be unrecoverable errors in which the job
  will always fail as if rerror/werror were set to report (example:
  error while switching backing files).  Does it make sense to fire an
  event before the point in time where such errors can happen?


* block-job-pause: A new QMP command.  Takes a block device (drive),
pauses an active background block operation on that device.  This
command returns immediately after marking the active background block
operation for pausing.  It is an error to call this command if no
operation is in progress.  The operation will pause as soon as possible
(it won't pause if the job is being cancelled).  No event is emitted
when the operation is actually paused.  Cancelling a paused job
automatically resumes it.


* block-job-resume: A new QMP command.  Takes a block device (drive),
resume a paused background block operation on that device.  This command
returns immediately after resuming a paused background block operation.
 It is an error to call this command if no operation is in progress.

A successful block-job-resume operation also resets the iostatus on the
job that is passed.

  Rationale: block-job-resume is required to restart a job that had
  on_error behavior set to 'stop' or 'enospc'.  Adding block-job-pause
  makes it simpler to test the new feature.


Other points specific to mirroring
==================================

* query-block-jobs: The returned JSON object will grow an additional
member, "target".  The target field is a dictionary with two fields,
"info" and "stats" (resembling the output of query-block and
query-blockstat but for the mirroring target).  Member "device" of the
BlockInfo structure will be made optional.

  Rationale: this allows libvirt to observe the high watermark of qcow2
  mirroring targets.

If present, the target has its own iostatus.  It is set when the job is
paused due to an error on the target (together with sending a
BLOCK_JOB_ERROR event). block-job-resume resets it.


* drive-mirror: activates mirroring to a second block device (optionally
creating the image on that second block device).  Compared to the
earlier versions, the "full" argument is replaced by an enum option
"sync" with three values:

- top: copies data in the topmost image to the destination

- full: copies data from all images to the destination

- dirty: copies clusters that are marked in the dirty bitmap to the
destination (see below)


* block-job-complete: force completion of mirroring and switching of the
device to the target, not related to the rest of the proposal.
Synchronously opens backing files if needed, asynchronously completes
the job.


* MIRROR_STATE_CHANGE: new event, triggered every time the
block-job-complete becomes available/unavailable.  Contains the device
name (like device: 'ide0-hd0'), and the state (synced: true/false).


Persistent dirty bitmap
=======================

A persistent dirty bitmap can be used by management for two reasons.
When mirroring is used for continuous replication of storage, to record
I/O operations that happened while the replication server is not
connected or unavailable.  When mirroring is used for storage migration,
to check after a management crash whether the VM must be restarted with
the source or the destination.

The dirty bitmap is synchronized on every bdrv_flush (or on every I/O
operation if the disk operates in writethrough or directsync mode).

The persistent dirty bitmap is created by management, but QEMU needs it
also for drive-mirror.  If so:

* if management has not set up a persistent dirty bitmap, QEMU will use
a simple non-persistent bitmap.

* if management has set up a persistent dirty bitmap and later calls
blockdev-dirty-disable, QEMU will delay the disabling until drive
mirroring also terminates.


The dirty bitmap is managed by these QMP commands:

* blockdev-dirty-enable: takes a file name used for the dirty bitmap,
and an optional granularity.  Setting the granularity will not be
supported in the initial version.

* query-block-dirty: returns statistics about the dirty bitmap: right
now the granularity, the number of bits that are set, and whether QEMU
is using the dirty bitmap or just adding to it.

* blockdev-dirty-disable: disable the dirty bitmap.


The dirty bitmap can also be specified on the command-line with -drive.

The dirty bitmap can be used as follows for storage migration.  To start
migration:

1) blockdev-dirty-enable ide0-hd0 /var/lib/libvirt/dirty/diskname

2) management notes existence of dirty bitmap for /mnt/src/diskname.img
in its private data

3) drive-mirror ide0-hd0 /mnt/dest/diskname.img

4) management notes /mnt/dest/diskname.img as the mirroring target in
its private data

At this point, mirroring has taken a reference to the dirty bitmap.  To
end migration:

5) blockdev-dirty-disable ide0-hd0

6) block-job-complete ide0-hd0

The dirty bitmap remains enabled until the BLOCK_JOB_COMPLETED event is
sent.

7) When management receives the BLOCK_JOB_COMPLETED event, it notes
switch to /mnt/dest/diskname.img (without dirty bitmap nor mirroring
target) in its private data.

If management crashes between (6) and (7), it can examine the dirty
bitmap on disk.  If it is all-zeros, management can restart the virtual
machine with /mnt/dest/diskname.img.  If it has even a single zero bit,
management can restart the virtual machine with the persistent dirty
bitmap enabled, and later issue again a drive-mirror command to restart
from step 4.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]