qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] live block copy/stream/snapshot discussion


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] live block copy/stream/snapshot discussion
Date: Tue, 12 Jul 2011 16:45:22 +0100

On Tue, Jul 12, 2011 at 9:06 AM, Kevin Wolf <address@hidden> wrote:
> Am 11.07.2011 18:32, schrieb Marcelo Tosatti:
>> On Mon, Jul 11, 2011 at 03:47:15PM +0100, Stefan Hajnoczi wrote:
>>> Kevin, Marcelo,
>>> I'd like to reach agreement on the QMP/HMP APIs for live block copy
>>> and image streaming.  Libvirt has acked the image streaming APIs that
>>> Adam proposed and I think they are a good fit for the feature.  I have
>>> described that API below for your review (it's exactly what the QED
>>> Image Streaming patches provide).
>>>
>>> Marcelo: Are you happy with this API for live block copy?  Also please
>>> take a look at the switch command that I am proposing.
>>>
>>> Image streaming API
>>> ===================
>>>
>>> For leaf images with copy-on-read semantics, the stream commands allow the 
>>> user
>>> to populate local blocks by manually streaming them from the backing image.
>>> Once all blocks have been streamed, the dependency on the original backing
>>> image can be removed.  Therefore, stream commands can be used to implement
>>> post-copy live block migration and rapid deployment.
>>>
>>> The block_stream command can be used to stream a single cluster, to
>>> start streaming the entire device, and to cancel an active stream.  It
>>> is easiest to allow the block_stream command to manage streaming for the
>>> entire device but a managent tool could use single cluster mode to
>>> throttle the I/O rate.
>
> As discussed earlier, having the management send requests for each
> single cluster doesn't make any sense at all. It wouldn't only throttle
> the I/O rate but bring it down to a level that makes it unusable. What
> you really want is to allow the management to give us a range (offset +
> length) that qemu should stream.

I feel that an iteration interface is problematic whether the
management tool or QEMU decide what to stream.  Let's have just the
background streaming operation.

The problem with byte ranges is two-fold.  The management tool doesn't
know which regions of the image are allocated so it may do a lot of
nop calls to already-allocated regions with no intelligence as to
where the next sensible offset for streaming is.  Secondly, because
the progress and performance of image streaming depend largely on
whether or not clusters are allocated (it is very fast when a cluster
is already allocated and we have no work to do), offsets are bad
indicators of progress to the user.  I think it's best not to expose
these details to the management tool at all.

The only reason for the iteration interface was to punt I/O throttling
to the management tool.  I think it would be easier to just throttle
inside the streaming function.

Kevin: Are you happy with dropping the iteration interface?
Adam: Is there a libvirt requirement for iteration or could we support
background copy only?

>>> The command synopses are as follows:
>>>
>>> block_stream
>>> ------------
>>>
>>> Copy data from a backing file into a block device.
>>>
>>> If the optional 'all' argument is true, this operation is performed in the
>>> background until the entire backing file has been copied.  The status of
>>> ongoing block_stream operations can be checked with query-block-stream.
>
> Not sure if it's a good idea to use a bool argument to turn a command
> into its opposite. I think having a separate command for stopping would
> be cleaner. Something for the QMP folks to decide, though.

git branch new_branch
git branch -D new_branch

Makes sense to me :)

>>> Arguments:
>>>
>>> - all:    copy entire device (json-bool, optional)
>>> - stop:   stop copying to device (json-bool, optional)
>>> - device: device name (json-string)
>>
>> It must be possible to specify backing file that will be
>> active after streaming finishes (data from that file will not
>> be streamed into active file, of course).
>
> Yes, I think the common base image belongs here.

Right.  We need to specify it by filename:

  - base: filename of base file (json-string, optional)

  Sectors are not copied from the base file and its backing file
  chain.  The following describes this feature:
    Before: base <- sn1 <- sn2 <- sn3 <- vm.img
    After:  base <- vm.img

> With all = false, where does the streaming begin?

Streaming begins at the start of the image.

> Do you have something like the "current streaming offset" in the state of 
> each BlockDriverState?

Yes, there is a StreamState for each block device that has an
in-progress operation.  The progress is saved between block_stream
(without -a) invocations so the caller does not need to specify the
streaming offset as an argument.

Thanks for pointing out these weaknesses in the documentation.  It
should really be explained fully.

>>> Return:
>>>
>>> - device: device name (json-string)
>>> - len:    size of the device, in bytes (json-int)
>>> - offset: ending offset of the completed I/O, in bytes (json-int)
>
> So you only get the reply when the request has completed? With the
> current monitor, this means that QMP is blocked while we stream, doesn't
> it? How are you supposed to send the stop command then?

Incomplete documentation again, sorry.  The block_stream command
behaves as follows:

1. block_stream all returns immediately and the BLOCK_STREAM_COMPLETED
event is raised when streaming completes either successfully or with
an error.

2. block_stream stop returns when the in-progress streaming operation
has been safely stopped.

3. block_stream returns when one iteration of streaming has completed.

> Two of three examples below have an empty return value instead, so they
> are not compliant to this specification.

I will update the documentation, the non-all invocations do not return anything.

>>> Examples:
>>>
>>> -> { "execute": "block_stream", "arguments": { "device": "virtio0" } }
>>> <- { "return":  { "device": "virtio0", "len": 10737418240, "offset": 512 } }
>>>
>>> -> { "execute": "block_stream", "arguments": { "all": true, "device":
>>> "virtio0" } }
>>> <- { "return": {} }
>>>
>>> -> { "execute": "block_stream", "arguments": { "stop": true, "device":
>>> "virtio0" } }
>>> <- { "return": {} }
>>>
>>> query-block-stream
>>> ------------------
>>>
>>> Show progress of ongoing block_stream operations.
>>>
>>> Return a json-array of all operations.  If no operation is active then an 
>>> empty
>>> array will be returned.  Each operation is a json-object with the following
>>> data:
>>>
>>> - device: device name (json-string)
>>> - len:    size of the device, in bytes (json-int)
>>> - offset: ending offset of the completed I/O, in bytes (json-int)
>>>
>>> Example:
>>>
>>> -> { "execute": "query-block-stream" }
>>> <- { "return":[
>>>        { "device": "virtio0", "len": 10737418240, "offset": 709632}
>>>     ]
>>>   }
>
> When block_stream is changed, this will have to make the same changes.
>
>>> Block device switching API
>>> ==========================
>>>
>>> Extend the 'change' command to support changing the image file without
>>> media change notification.
>>>
>>> Perhaps we should take the opportunity to add a "format" argument for
>>> image files?
>>>
>>> change
>>> ------
>>>
>>> Change a removable medium or VNC configuration.
>>>
>>> Arguments:
>>>
>>> - "device": device name (json-string)
>>> - "target": filename or item (json-string)
>>> - "arg": additional argument (json-string, optional)
>>> - "notify": whether to notify guest, defaults to true (json-bool, optional)
>>>
>>> Examples:
>>>
>>> 1. Change a removable medium
>>>
>>> -> { "execute": "change",
>>>              "arguments": { "device": "ide1-cd0",
>>>                             "target": 
>>> "/srv/images/Fedora-12-x86_64-DVD.iso" } }
>>> <- { "return": {} }
>>>
>>> 2. Change a disk without media change notification
>>>
>>> -> { "execute": "change",
>>>              "arguments": { "device": "virtio-blk0",
>>>                             "target": "/srv/images/vm_1.img",
>>>                             "notify": false } }
>>>
>>> 3. Change VNC password
>>>
>>> -> { "execute": "change",
>>>              "arguments": { "device": "vnc", "target": "password",
>>>                             "arg": "foobar1" } }
>>> <- { "return": {} }
>
> I find it rather disturbing that a command like 'change' has made it
> into QMP... Anyway, I don't think this is really what we need.
>
> We have two switches to do. The first one happens before starting the
> copy: Creating the copy, with the source as its backing file, and
> switching to that. The monitor command to achieve this is snapshot_blkdev.

I don't think that creating image files in QEMU is going to work when
running KVM with libvirt (SELinux).  The QEMU process does not have
the ability to create new image files.  It needs at least a file
descriptor to an empty file or maybe a file that has been created
using qemu-img like I showed above.

> The second switch is after the copy has completed. At this point you can
> remove the source as the backing file and use the common base image
> instead. This is a call to bdrv_change_backing_file(), for which a
> monitor command doesn't exist yet (and unless we want to overload
> 'change' even more, it's not the right command to do this).

I agree.  We need the ability to change the backing file (aka qemu-img
rebase -u).

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]