[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v9 1/8] Introduce yank feature
From: |
Markus Armbruster |
Subject: |
Re: [PATCH v9 1/8] Introduce yank feature |
Date: |
Fri, 30 Oct 2020 15:02:09 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) |
Lukas Straub <lukasstraub2@web.de> writes:
> On Thu, 29 Oct 2020 17:36:14 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> Nothing major, looks almost ready to me.
>>
>> Lukas Straub <lukasstraub2@web.de> writes:
>>
>> > The yank feature allows to recover from hanging qemu by "yanking"
>> > at various parts. Other qemu systems can register themselves and
>> > multiple yank functions. Then all yank functions for selected
>> > instances can be called by the 'yank' out-of-band qmp command.
>> > Available instances can be queried by a 'query-yank' oob command.
>> >
>> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
>> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
>> > ---
>> > include/qemu/yank.h | 95 ++++++++++++++++++++
>> > qapi/misc.json | 106 ++++++++++++++++++++++
>> > util/meson.build | 1 +
>> > util/yank.c | 213 ++++++++++++++++++++++++++++++++++++++++++++
>>
>> checkpatch.pl warns:
>>
>> WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
>>
>> Can we find a maintainer for the two new files?
>
> Yes, I'm maintaining this for now, see patch 7.
Thanks! Would it make sense to add the yank stuff to a new QAPI module
yank.json instead of misc.jaon, so the new MAINTAINERS stanza can cover
it?
[...]
>> > diff --git a/qapi/misc.json b/qapi/misc.json
>> > index 40df513856..3b7de02a4d 100644
>> > --- a/qapi/misc.json
>> > +++ b/qapi/misc.json
[...]
>> > +##
>> > +# @YankInstance:
>> > +#
>> > +# A yank instance can be yanked with the "yank" qmp command to recover
>> > from a
>> > +# hanging qemu.
>>
>> QEMU
>>
>> > +#
>> > +# Currently implemented yank instances:
>> > +# -nbd block device:
>> > +# Yanking it will shutdown the connection to the nbd server without
>> > +# attempting to reconnect.
>> > +# -socket chardev:
>> > +# Yanking it will shutdown the connected socket.
>> > +# -migration:
>> > +# Yanking it will shutdown all migration connections.
>>
>> To my surprise, this is recognized as bullet list markup. But please
>> put a space between the bullet and the text anyway.
>>
>> Also: "shutdown" is a noun, the verb is spelled "shut down".
>
> Both changed for the next version.
>
>> In my review of v8, I asked how yanking migration is related to command
>> migrate_cancel. Daniel explained:
>>
>> migrate_cancel will do a shutdown() on the primary migration socket only.
>> In addition it will toggle the migration state.
>>
>> Yanking will do a shutdown on all migration sockets (important for
>> multifd), but won't touch migration state or any other aspect of QEMU
>> code.
>>
>> Overall yanking has less potential for things to go wrong than the
>> migrate_cancel method, as it doesn't try to do any kind of cleanup
>> or migration.
>>
>> Would it make sense to work this into the documentation?
>
> How about this?
>
> - migration:
> Yanking it will shut down all migration connections. Unlike
> @migrate_cancel, it will not notify the migration process,
> so migration will go into @failed state, instead of @cancelled
> state.
Works for me. Advice on when to use it rather than migrate_cancel would
be nice, though.
>> > +#
>> > +# Since: 5.2
>> > +##
>> > +{ 'union': 'YankInstance',
>> > + 'base': { 'type': 'YankInstanceType' },
>> > + 'discriminator': 'type',
>> > + 'data': {
>> > + 'blockdev': 'YankInstanceBlockdev',
>> > + 'chardev': 'YankInstanceChardev' } }
>> > +
>> > +##
>> > +# @yank:
>> > +#
>> > +# Recover from hanging qemu by yanking the specified instances. See
>>
>> QEMU
>>
>> "Try to recover" would be more precise, I think.
>
> Changed for the next version.
>
>> > +# "YankInstance" for more information.
>> > +#
>> > +# Takes a list of @YankInstance as argument.
>> > +#
>> > +# Returns: nothing.
>> > +#
>> > +# Example:
>> > +#
>> > +# -> { "execute": "yank",
>> > +# "arguments": {
>> > +# "instances": [
>> > +# { "type": "block-node",
>> > +# "node-name": "nbd0" }
>> > +# ] } }
>> > +# <- { "return": {} }
>> > +#
>> > +# Since: 5.2
>> > +##
>> > +{ 'command': 'yank',
>> > + 'data': { 'instances': ['YankInstance'] },
>> > + 'allow-oob': true }
>> > +
[...]
>> > diff --git a/util/yank.c b/util/yank.c
>> > new file mode 100644
>> > index 0000000000..0b3a816706
>> > --- /dev/null
>> > +++ b/util/yank.c
[...]
>> > +void qmp_yank(YankInstanceList *instances,
>> > + Error **errp)
>> > +{
>> > + YankInstanceList *tail;
>> > + YankInstanceEntry *entry;
>> > + YankFuncAndParam *func_entry;
>> > +
>> > + qemu_mutex_lock(&yank_lock);
>> > + for (tail = instances; tail; tail = tail->next) {
>> > + entry = yank_find_entry(tail->value);
>> > + if (!entry) {
>> > + error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not
>> > found");
>>
>> Quote error.h:
>>
>> * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
>> * strongly discouraged.
>>
>> Any particular reason for ERROR_CLASS_DEVICE_NOT_FOUND? If not, then
>> error_setg(), please.
>
> There may be a time-to-check to time-to-use race condition here. Suppose
> the management application (via QMP) calls 'blockev-del nbd0', then QEMU
> hangs, so after a timeout it calls 'yank nbd0' while shortly before that
> QEMU unhangs for some reason and removes the blockdev. Then yank returns
> this error.
>
> QMP errors should not be parsed except for the error class, so the the
> management application can only differentiate between this (ignorable)
> error and other (possibly fatal) ones by parsing the error class.
I see.
DeviceNotFound doesn't really fit, but none of the other error classes
is any better.
I think the line
# Returns: nothing.
in the QAPI schema (quoted above) should be expanded to something like
# Returns: - Nothing on success
- If the YankInstance doesn't exist, DeviceNotFound
>> > + qemu_mutex_unlock(&yank_lock);
>> > + return;
>> > + }
>> > + }
>> > + for (tail = instances; tail; tail = tail->next) {
>> > + entry = yank_find_entry(tail->value);
>> > + assert(entry);
>> > + QLIST_FOREACH(func_entry, &entry->yankfns, next) {
>> > + func_entry->func(func_entry->opaque);
>> > + }
>> > + }
>> > + qemu_mutex_unlock(&yank_lock);
>> > +}
[...]
[PATCH v9 3/8] chardev/char-socket.c: Add yank feature, Lukas Straub, 2020/10/28
[PATCH v9 4/8] migration: Add yank feature, Lukas Straub, 2020/10/28
[PATCH v9 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe, Lukas Straub, 2020/10/28
[PATCH v9 6/8] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown, Lukas Straub, 2020/10/28
[PATCH v9 7/8] MAINTAINERS: Add myself as maintainer for yank feature, Lukas Straub, 2020/10/28
[PATCH v9 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test, Lukas Straub, 2020/10/28