Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1

From:	Paolo Bonzini
Subject:	Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Date:	Mon, 21 May 2012 12:02:06 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

Il 21/05/2012 11:29, Kevin Wolf ha scritto:
>> * block-stream: I propose adding two options to the existing
>> block-stream command.  If this is rejected, only mirroring will be able
>> to use rerror/werror.
>>
>> The new options are of course rerror/werror.  They are enum options,
>> with the following possible values:
> 
> Do we really need separate werror/rerror? For guest operations they
> really exist only for historical reasons: werror was there first, and
> when we wanted the same functionality, it seemed odd to overload werror
> to include reads as well.
> 
> For block jobs, where there is no such option yet, we could go with a
> single error option, unless there is a use case for separate
> werror/rerror options.

For mirroring rerror=source and werror=target.  I'm not sure there is an
actual usecase, but at least it is more interesting than for devices...

>> 'report': The behavior is the same as in 1.1.  An I/O error,
>> respectively during a read or a write, will complete the job immediately
>> with an error code.
>>
>> 'ignore': An I/O error, respectively during a read or a write, will be
>> ignored.  For streaming, the job will complete with an error and the
>> backing file will be left in place.  For mirroring, the sector will be
>> marked again as dirty and re-examined later.
> 
> This is not really 'ignore' as used for guest operations. There it means
> "no matter what the return value is, the operation has succeeded". For
> streaming it would mean that it just goes on with the next cluster (and
> if we don't cut the backing file link at the end, it would at least not
> corrupt anything).

Yes, for streaming it would mean that it just goes on with the next
cluster and then report an error at the end.

> Just like with guest operations it's a mostly useless mode, do we really
> need this option?

Perhaps we should remove it for guest operations as well; certainly it
makes more sense (if any) for jobs than for guest operations.

>> 'stop': The VM *and* the job will be paused---the VM is stopped even if
>> the block device has neither rerror=stop nor werror={stop,enospc}.  The
>> error is recorded in the block device's iostatus (which can be examined
>> with query-block).  However, a BLOCK_IO_ERROR event will _never_ pause a
>> job.
>>
>>   Rationale: stopping all I/O seems to be the best choice in order
>>   to limit the number of errors received.  However, due to backwards-
>>   compatibility with QEMU 1.1 we cannot pause the job when guest-
>>   initiated I/O causes an error.  We could do that if the block
>>   device has rerror=stop/werror={stop,enospc}, but it seems more
>>   complicated to just never do it.
> 
> I don't agree with stopping the VM. Consider a case where the target is
> somewhere on the network and you lose the connection, but the primary
> image is local on the hard disk. You don't want to stop the VM just
> because continuing with the copy isn't possible for the moment.

I think this is something that management should resolve.  For an error
on the source, stopping the VM makes sense.  I don't think management
cares about what caused an I/O error on a device.  Does it matter if
streaming was active or rather the guest was executing "dd if=/dev/sda
of=/dev/null".

Management may want to keep the VM stopped even for an error on the
target, as long as mirroring has finished the initial synchronization
step.  The VM can perform large amounts of I/O while the job is paused,
and then completing the job can take a large amount of time.

> Of course, this means that you can't reuse the block device's io_status,
> but you need a separate job_iostatus.

For mirroring, source and target are separate devices and have separate
iostatuses anyway.

> If the VM is stopped (including BLOCK_IO_ERROR), no I/O should be going
> on at all. Do we really keep running the jobs in 1.1? If so, this is a
> bug and should be fixed before the release.

Yes, we do.  Do you think it's a problem for migration (thinking more
about it: ouch, yes, it should be)?

We have no pause/resume infrastructure, so we could simply force
synchronous cancellation at the end (before vm_stop_force_state).
Stefan, do you have any free cycle for this?

>> * query-block-jobs: The returned JSON object will grow an additional
>> member, "target".  The target field is a dictionary with two fields,
>> "info" and "stats" (resembling the output of query-block and
>> query-blockstat but for the mirroring target).  Member "device" of the
>> BlockInfo structure will be made optional.
>>
>>   Rationale: this allows libvirt to observe the high watermark of qcow2
>>   mirroring targets, and avoids putting a bad iostatus on a working
>>   migration source.
> 
> The mirroring target should be present in query-block instead. It is a
> user-visible BlockDriverState

It is not user visible, and making it user visible adds a lot more
things to worry about (e.g. making sure you cannot use it in a
device_add).

It reminds me of Xen's renaming of domains (foo->migrating-foo and
foo->zombie-foo), which was an endless source of pain.

I'd rather make the extension of query-block-jobs more generic, with a
list "devices" instead of a member "target", and making up the device
name in the implementation (so you have "device": "target" for mirroring).

>> * block-job-complete: new command specific to mirroring (switches the
>> device to the target), not related to the rest of the proposal.
> 
> What semantics will block-job-cancel have then for mirroring? Will it be
> incompatible with RHEL 6?

They will have the same semantics: make sure that the target matches
some state of the source, and drop it.  block-job-complete adds the
switch to the target (which I'll do with something like bdrv_append,
btw).  It synchronously opens the backing files of the target, and
asynchronously completes the job.

Upstream will not have drive-reopen; this will be incompatible with RHEL6.

Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Paolo Bonzini, 2012/05/18
- Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Kevin Wolf, 2012/05/21
  - Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Paolo Bonzini <=
    - Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Kevin Wolf, 2012/05/21
    - Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Paolo Bonzini, 2012/05/21
    - Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Kevin Wolf, 2012/05/21
    - Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Paolo Bonzini, 2012/05/21
    - Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Eric Blake, 2012/05/21
- Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Stefan Hajnoczi, 2012/05/21
- Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Luiz Capitulino, 2012/05/21
  - Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Kevin Wolf, 2012/05/21
    - Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Luiz Capitulino, 2012/05/21
    - Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2, Anthony Liguori, 2012/05/21

Prev by Date: Re: [Qemu-devel] [PATCH] Add a memory barrier to guest memory access functions
Next by Date: [Qemu-devel] [PATCH 0/3] tighten scope of accidentally global variables
Previous by thread: Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Next by thread: Re: [Qemu-devel] Proposal for extensions of block job commands in QEMU 1.2
Index(es):
- Date
- Thread