[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v3 09/27] migration: add reporting of errors for
From: |
Daniel P. Berrange |
Subject: |
Re: [Qemu-devel] [PATCH v3 09/27] migration: add reporting of errors for outgoing migration |
Date: |
Fri, 4 Mar 2016 10:49:46 +0000 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
On Fri, Mar 04, 2016 at 10:49:35AM +0100, Markus Armbruster wrote:
> "Daniel P. Berrange" <address@hidden> writes:
>
> > Currently if an app initiates an outgoing migration, it
>
> application
>
> > may or may not, get an error reported back on failure. If
> > the error occurs synchronously to the 'migrate' command
> > execution, the client app will see the error message. This
> > is the case for DNS lookup failures. If the error occurs
> > asynchronously to the monitor command though, the error
> > will be thrown away and the client left guessing about
> > what went wrong. This is the case for failure to connect
> > to the TCP server (eg due to wrong port, or firewall
> > rules, or other similar errors).
> >
> > In the future we'll be adding more scope for errors to
> > happen asynchronously with the TLS protocol handshake.
> > TLS errors are hard to diagnose even when they are well
> > reported, so discarding errors entirely will make it
> > impossible to debug TLS connection problems.
> >
> > Management apps which do migration are already using
> > 'query-migrate' / 'info migrate' to check up on progress
> > of background migration operations and to see their end
> > status. This is a fine place to also include the error
> > message when things go wrong.
> >
> > This patch thus adds an 'error-desc' field to the
> > MigrationInfo struct, which will be populated when
> > the 'status' is set to 'failed':
> >
> > (qemu) migrate -d tcp:localhost:9001
> > (qemu) info migrate
> > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks:
> > off compress: off events: off x-postcopy-ram: off
> > Migration status: failed
> > total time: 0 milliseconds
> > error description: Error connecting to socket: Connection refused
>
> Perhaps:
>
> Migration status: failed (Error connecting to socket: Connection refused)
>
> Just an idea, use whatever you like better.
Yeah, that is nicer for the HMP.
> > In the HMP, when doing non-detached migration, it is
> > also possible to display this error message directly
> > to the app.
> >
> > (qemu) migrate tcp:localhost:9001
> > Error connecting to socket: Connection refused
>
> You could include a QMP example if you like.
Sure, will add it.
> > @@ -853,12 +857,14 @@ static void migrate_fd_cleanup(void *opaque)
> > notifier_list_notify(&migration_state_notifiers, s);
> > }
> >
> > -void migrate_fd_error(MigrationState *s)
> > +void migrate_fd_error(MigrationState *s, const Error *error)
> > {
> > - trace_migrate_fd_error();
> > + trace_migrate_fd_error(error ? error_get_pretty(error) : "");
> > assert(s->to_dst_file == NULL);
> > migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
> > MIGRATION_STATUS_FAILED);
> > + error_free(s->error);
> > + s->error = error_copy(error);
>
> Can migrate_fd_error() be called more than once per migration attempt?
I think so, but I felt it worth being paranoid against mistakes
so I chose to call error_free() just in case.
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 7b8f2a1..ff89747 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -484,6 +484,8 @@
> > # throttled during auto-converge. This is only present when
> > auto-converge
> > # has started throttling guest cpus. (Since 2.5)
> > #
> > +# @error-desc: the error description string, when @status == 'failed'
> > (Since 2.6)
> > +#
>
> "when @status == 'failed'" is a semantic constraint, not visible in
> query-qmp-schema. Several existing members have similar constraints.
>
> Making this a flat union tagged by @status would turn semantic
> constraints into syntax, visible in query-qmp-schema. Not sure it's
> worth the churn now.
>
> Note that qmp_query_migrate() may not actually set @error-desc when
> @status is 'failed'. Bug in either the code or the documentation.
Bug in docs :-)
> Further note that code uses it without also checking status. You could
> assert the constraint holds. Your choice.
>
> Let me take a step back and examine the bigger picture.
>
> Migration is a long-running task with a non-trivial live cycle (see enum
> MigrationStatus). Similar tasks exist in the block layer ("block
> jobs"), and perhaps we can have a generic "jobs" abstraction some day.
>
> Originally, QMP was designed to permit doing long-running tasks as
> asynchronous commands. We ended up doing them as synchronous commands +
> events + status queries. Two reasons, one accidental, one fundamental.
>
> The accidental one is asynchronous commands never quite worked, so
> nobody used them, so nobody bothered to fix them.
>
> The fundamental one is complex life cycles. A job that starts, runs
> silently for a while, then either completes or fails can be done nicely
> as asynchronous command. But when the job's life cycle is more complex,
> a single command reply isn't enough. Thus commands + events + status
> queries.
>
> In this model, the reply to a status query in a "failed" state takes the
> role of the asynchronous command's error reply. It therefore makes
> sense to compare the two.
>
> The QMP error reply is documented as follows in qmp-spec.txt:
>
> The format of an error response is:
>
> { "error": { "class": json-string, "desc": json-string }, "id":
> json-value }
>
> Where,
>
> - The "class" member contains the error class name (eg. "GenericError")
> - The "desc" member is a human-readable error message. Clients should
> not attempt to parse this message.
> - The "id" member contains the transaction identification associated with
> the command execution if issued by the Client
>
> MigrationInfo in a "failed" contains @status and @error-desc, and may
> contain additional members like @total-time (not entirely clear from
> documentation, which means the documentation is too vague).
>
> MigrationInfo covers @desc, but not @class and @id.
>
> Use of @class is discouraged nowadays, and omitting it here at least
> until we have a compelling use for it makes sense.
>
> @id ties the asynchronous reply to the command. Not necessary as long
> as only one migration task can exist.
>
> The only thing I'd like you to add to MigrationInfo is the "Clients
> should not attempt to parse this" admonition. Suggest to describe
> @error-desc as "human-readable" while there.
Yes, makes sense.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|