qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 7/7] migration: introduce snapshot-{save, load, delete} QM


From: Daniel P . Berrangé
Subject: Re: [PATCH v3 7/7] migration: introduce snapshot-{save, load, delete} QMP commands
Date: Tue, 1 Sep 2020 17:47:32 +0100
User-agent: Mutt/1.14.6 (2020-07-11)

On Tue, Sep 01, 2020 at 04:20:47PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > savevm, loadvm and delvm are some of the few HMP commands that have never
> > been converted to use QMP. The primary reason for this lack of conversion
> > is that they block execution of the thread for as long as they run.
> 
> Nope.  The primary reason is that the HMP interface is bonkers.

I don't think that's very helpful description. The HMP interface has
some limitations, but it isn't bonkers - it just doesn't cope with
all the use cases we want. Many people use it succesfully without
issue

> > Despite this downside, however, libvirt and applications using libvirt
> > have used these commands for as long as QMP has existed, via the
> > "human-monitor-command" passthrough command. IOW, while it is clearly
> > desirable to be able to fix the blocking problem, this is not an
> > immediate obstacle to real world usage.
> >
> > Meanwhile there is a need for other features which involve adding new
> > parameters to the commands. This is possible with HMP passthrough, but
> > it provides no reliable way for apps to introspect features, so using
> > QAPI modelling is highly desirable.
> >
> > This patch thus introduces new snapshot-{load,save,delete} commands to
> > QMP that are intended to replace the old HMP counterparts. The new
> > commands are given different names, because they will be using the new
> > QEMU job framework and thus will have diverging behaviour from the HMP
> > originals. It would thus be misleading to keep the same name.
> >
> > While this design uses the generic job framework, the current impl is
> > still blocking. The intention that the blocking problem is fixed later.
> > None the less applications using these new commands should assume that
> > they are asynchronous and thus wait for the job status change event to
> > indicate completion.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> [...]
> > diff --git a/qapi/job.json b/qapi/job.json
> > index 280c2f76f1..51bee470f0 100644
> > --- a/qapi/job.json
> > +++ b/qapi/job.json
> > @@ -22,10 +22,17 @@
> >  #
> >  # @amend: image options amend job type, see "x-blockdev-amend" (since 5.1)
> >  #
> > +# @snapshot-load: snapshot load job type, see "loadvm" (since 5.2)
> 
> Do you mean 'see command @snapshot-load?

Yes, I guess so.

> 
> > +#
> > +# @snapshot-save: snapshot save job type, see "savevm" (since 5.2)
> 
> @snapshot-save?
> 
> > +#
> > +# @snapshot-delete: snapshot delete job type, see "delvm" (since 5.2)
> 
> @snapshot-delete?
> 
> > +#
> >  # Since: 1.7
> >  ##
> >  { 'enum': 'JobType',
> > -  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend'] }
> > +  'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend',
> > +           'snapshot-load', 'snapshot-save', 'snapshot-delete'] }
> >  
> >  ##
> >  # @JobStatus:
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 5f6b06172c..d70f627b77 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1720,3 +1720,138 @@
> >  ##
> >  { 'event': 'UNPLUG_PRIMARY',
> >    'data': { 'device-id': 'str' } }
> > +
> > +##
> > +# @snapshot-save:
> > +#
> > +# Save a VM snapshot
> > +#
> > +# @job-id: identifier for the newly created job
> > +# @tag: name of the snapshot to create. If it already
> > +# exists it will be replaced.
> 
> Sounds a bit dangerous.  Require a force flag for such an overwrite?
> Not sure.

Yes, replacing is quite likely to be a mistake.

"@force" could mean many things, so "replace-existing: bool" is
probably a clearer name.

> 
> > +# @devices: list of block device node names to save a snapshot to
> > +# @vmstate: block device node name to save vmstate to
> 
> Worth mentioning that omitting writable block devices is probably a bad
> idea?

Sure

> > +#
> > +# Applications should not assume that the snapshot save is complete
> > +# when this command returns.
> 
> Is it complete then with the current code?  I'm asking because such
> properties have a way to sneakily become de facto ABI.  We may not be
> able to do anything about that now, other than documenting "don't do
> that" like you did, but I'd like to understand the state of affairs all
> the same.

Yes, the actual snapshot is synchronous with return of the command.

> 
> > +#                            Completion is indicated by the job
> > +# status. Clients can wait for the JOB_STATUS_CHANGE event. If the
> > +# job aborts, errors can be obtained via the 'query-jobs' command,
> > +# though.
> 
> Sure we want to these job basics here?

This ties in with the previous point. If feel if we don't document
the use of events here, then people are likely to blindly assume
synchronous completion. By explicitly telling them to wait for the
JOB_STATUS_CHANGE they are nudged towards a correct solution that
won't break if it becomes async later.

> 
> > +#         Note that at this time most vmstate procssing errors only
> 
> Typo: processing
> 
> Whatever a "vmstate processing error" is...
> 
> > +# get printed to stderr. This limitation will be fixed at a future
> > +# date.
> 
> Is that a promise?  ;)

I don't know when I'll have time, as I've not looked at just how
complex the conversion is. It is *highly* desirable to fix this
otherwise debugging failures is an exercise in extreme pain through
lack of useful information.

> 
> > +#
> > +# Note that the VM CPUs will be paused during the time it takes to
> > +# save the snapshot
> 
> End the sentence with a period, please.
> 
> > +#
> > +# If @devices is not specified, or is an empty list, then the
> > +# historical default logic for picking devices will be used.
> 
> Why is this useful for QMP?
> 
> > +#
> > +# If @vmstate is not specified, then the first valid block
> > +# device will be used for vmstate.
> 
> Why is this useful for QMP?

Both of these makes QEMU just "do the right thing" with the majority
of QEMU guest configurations with no special knowledge needed by
the mgmt app.

It makes it possible for all existing apps to immediately stop using
the loadvm/savevm commands via HMP passthrough, and convert to the
QMP commands.

Without this, applications will need to first convert to use -blockdev
before they can use the load-snapshot/save-snapshot commands, because
the devices are specified exclusively using blockdev node names, not
the legacy drive IDs. I didn't want to make blockdev a mandatory
dependancy unless apps want to opt-in to the fine grained control
over disk choices


> > +##
> > +# @snapshot-load:
> > +#
> > +# Load a VM snapshot
> > +#
> > +# @job-id: identifier for the newly created job
> > +# @tag: name of the snapshot to load.
> > +# @devices: list of block device node names to load a snapshot from
> > +# @vmstate: block device node name to load vmstate from
> 
> Worth mentioning that omitting block devices that may have changed since
> the save is probably a bad idea?

Yep.

> 
> > +#
> > +# Applications should not assume that the snapshot load is complete
> > +# when this command returns. Completion is indicated by the job
> > +# status. Clients can wait for the JOB_STATUS_CHANGE event. If the
> > +# job aborts, errors can be obtained via the 'query-jobs' command,
> > +# though. Note that at this time most vmstate procssing errors only
> > +# get printed to stderr. This limitation will be fixed at a future
> > +# date.
> 
> Comments on snapshot-load apply.
> 
> > +#
> > +# If @devices is not specified, or is an empty list, then the
> > +# historical default logic for picking devices will be used.
> 
> Why is this useful for QMP?
> 
> > +#
> > +# If @vmstate is not specified, then the first valid block
> > +# device will be used for vmstate.
> 
> Why is this useful for QMP?
> 
> A more useful default could be "if exactly one the block devices being
> restored contains a vmstate, use that".

I feel it is more important to be symetric with save-snapshot.  ie if you
supply or omit the same args for save-snapshot and load-snapshot, you
know both will work, or neither will work. You dont get into a situation
where you can succesfully save the snapshot, but not restore it.


> > +##
> > +# @snapshot-delete:
> > +#
> > +# Delete a VM snapshot
> > +#
> > +# @job-id: identifier for the newly created job
> > +# @tag: name of the snapshot to delete.
> > +# @devices: list of block device node names to delete a snapshot from
> > +#
> > +# Applications should not assume that the snapshot load is complete
> > +# when this command returns. Completion is indicated by the job
> > +# status. Clients can wait for the JOB_STATUS_CHANGE event.
> 
> Comments on snapshot-load apply.
> 
> One difference: no "If the job aborts, ..."  Intentional?

I guess it can abort if the file is corrupt perhaps. Generally
thogh if the named snapshot doesnt exist in the block device, it
is considered success, not an error.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]