qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v9 1/8] Introduce yank feature


From: Markus Armbruster
Subject: Re: [PATCH v9 1/8] Introduce yank feature
Date: Fri, 30 Oct 2020 15:02:09 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Lukas Straub <lukasstraub2@web.de> writes:

> On Thu, 29 Oct 2020 17:36:14 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> Nothing major, looks almost ready to me.
>> 
>> Lukas Straub <lukasstraub2@web.de> writes:
>> 
>> > The yank feature allows to recover from hanging qemu by "yanking"
>> > at various parts. Other qemu systems can register themselves and
>> > multiple yank functions. Then all yank functions for selected
>> > instances can be called by the 'yank' out-of-band qmp command.
>> > Available instances can be queried by a 'query-yank' oob command.
>> >
>> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
>> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
>> > ---
>> >  include/qemu/yank.h |  95 ++++++++++++++++++++
>> >  qapi/misc.json      | 106 ++++++++++++++++++++++
>> >  util/meson.build    |   1 +
>> >  util/yank.c         | 213 ++++++++++++++++++++++++++++++++++++++++++++  
>> 
>> checkpatch.pl warns:
>> 
>>     WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
>> 
>> Can we find a maintainer for the two new files?
>
> Yes, I'm maintaining this for now, see patch 7.

Thanks!  Would it make sense to add the yank stuff to a new QAPI module
yank.json instead of misc.jaon, so the new MAINTAINERS stanza can cover
it?

[...]
>> > diff --git a/qapi/misc.json b/qapi/misc.json
>> > index 40df513856..3b7de02a4d 100644
>> > --- a/qapi/misc.json
>> > +++ b/qapi/misc.json
[...]
>> > +##
>> > +# @YankInstance:
>> > +#
>> > +# A yank instance can be yanked with the "yank" qmp command to recover 
>> > from a
>> > +# hanging qemu.  
>> 
>> QEMU
>> 
>> > +#
>> > +# Currently implemented yank instances:
>> > +#  -nbd block device:
>> > +#   Yanking it will shutdown the connection to the nbd server without
>> > +#   attempting to reconnect.
>> > +#  -socket chardev:
>> > +#   Yanking it will shutdown the connected socket.
>> > +#  -migration:
>> > +#   Yanking it will shutdown all migration connections.  
>> 
>> To my surprise, this is recognized as bullet list markup.  But please
>> put a space between the bullet and the text anyway.
>> 
>> Also: "shutdown" is a noun, the verb is spelled "shut down".
>
> Both changed for the next version.
>
>> In my review of v8, I asked how yanking migration is related to command
>> migrate_cancel.  Daniel explained:
>> 
>>     migrate_cancel will do a shutdown() on the primary migration socket only.
>>     In addition it will toggle the migration state.
>> 
>>     Yanking will do a shutdown on all migration sockets (important for
>>     multifd), but won't touch migration state or any other aspect of QEMU
>>     code.
>> 
>>     Overall yanking has less potential for things to go wrong than the
>>     migrate_cancel method, as it doesn't try to do any kind of cleanup
>>     or migration.
>> 
>> Would it make sense to work this into the documentation?
>
> How about this?
>
>   - migration:
>     Yanking it will shut down all migration connections. Unlike
>     @migrate_cancel, it will not notify the migration process,
>     so migration will go into @failed state, instead of @cancelled
>     state.

Works for me.  Advice on when to use it rather than migrate_cancel would
be nice, though.

>> > +#
>> > +# Since: 5.2
>> > +##
>> > +{ 'union': 'YankInstance',
>> > +  'base': { 'type': 'YankInstanceType' },
>> > +  'discriminator': 'type',
>> > +  'data': {
>> > +      'blockdev': 'YankInstanceBlockdev',
>> > +      'chardev': 'YankInstanceChardev' } }
>> > +
>> > +##
>> > +# @yank:
>> > +#
>> > +# Recover from hanging qemu by yanking the specified instances. See  
>> 
>> QEMU
>> 
>> "Try to recover" would be more precise, I think.
>
> Changed for the next version.
>
>> > +# "YankInstance" for more information.
>> > +#
>> > +# Takes a list of @YankInstance as argument.
>> > +#
>> > +# Returns: nothing.
>> > +#
>> > +# Example:
>> > +#
>> > +# -> { "execute": "yank",
>> > +#      "arguments": {
>> > +#          "instances": [
>> > +#               { "type": "block-node",
>> > +#                 "node-name": "nbd0" }
>> > +#          ] } }
>> > +# <- { "return": {} }
>> > +#
>> > +# Since: 5.2
>> > +##
>> > +{ 'command': 'yank',
>> > +  'data': { 'instances': ['YankInstance'] },
>> > +  'allow-oob': true }
>> > +
[...]
>> > diff --git a/util/yank.c b/util/yank.c
>> > new file mode 100644
>> > index 0000000000..0b3a816706
>> > --- /dev/null
>> > +++ b/util/yank.c
[...]
>> > +void qmp_yank(YankInstanceList *instances,
>> > +              Error **errp)
>> > +{
>> > +    YankInstanceList *tail;
>> > +    YankInstanceEntry *entry;
>> > +    YankFuncAndParam *func_entry;
>> > +
>> > +    qemu_mutex_lock(&yank_lock);
>> > +    for (tail = instances; tail; tail = tail->next) {
>> > +        entry = yank_find_entry(tail->value);
>> > +        if (!entry) {
>> > +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not 
>> > found");  
>> 
>> Quote error.h:
>> 
>>  * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
>>  * strongly discouraged.
>> 
>> Any particular reason for ERROR_CLASS_DEVICE_NOT_FOUND?  If not, then
>> error_setg(), please.
>
> There may be a time-to-check to time-to-use race condition here. Suppose
> the management application (via QMP) calls 'blockev-del nbd0', then QEMU
> hangs, so after a timeout it calls 'yank nbd0' while shortly before that
> QEMU unhangs for some reason and removes the blockdev. Then yank returns
> this error.
>
> QMP errors should not be parsed except for the error class, so the the
> management application can only differentiate between this (ignorable)
> error and other (possibly fatal) ones by parsing the error class.

I see.

DeviceNotFound doesn't really fit, but none of the other error classes
is any better.

I think the line

      # Returns: nothing.

in the QAPI schema (quoted above) should be expanded to something like


      # Returns: - Nothing on success
                 - If the YankInstance doesn't exist, DeviceNotFound

>> > +            qemu_mutex_unlock(&yank_lock);
>> > +            return;
>> > +        }
>> > +    }
>> > +    for (tail = instances; tail; tail = tail->next) {
>> > +        entry = yank_find_entry(tail->value);
>> > +        assert(entry);
>> > +        QLIST_FOREACH(func_entry, &entry->yankfns, next) {
>> > +            func_entry->func(func_entry->opaque);
>> > +        }
>> > +    }
>> > +    qemu_mutex_unlock(&yank_lock);
>> > +}
[...]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]