[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] monitor/qmp: fix race with clients disconnecting early
From: |
Philippe Mathieu-Daudé |
Subject: |
Re: [PATCH] monitor/qmp: fix race with clients disconnecting early |
Date: |
Mon, 23 Aug 2021 17:05:20 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 |
Cc'ing Marc-André
On 8/23/21 12:11 PM, Stefan Reiter wrote:
> From: Stefan Reiter <stefan@pimaker.at>
>
> The following sequence can produce a race condition that results in
> responses meant for different clients being sent to the wrong one:
>
> (QMP, no OOB)
> 1) client A connects
> 2) client A sends 'qmp_capabilities'
> 3) 'qmp_dispatch' runs in coroutine, schedules out to
> 'do_qmp_dispatch_bh' and yields
> 4) client A disconnects (i.e. aborts, crashes, etc...)
> 5) client B connects
> 6) 'do_qmp_dispatch_bh' runs 'qmp_capabilities' and wakes calling coroutine
> 7) capabilities are now set and 'mon->commands' is set to '&qmp_commands'
> 8) 'qmp_dispatch' returns to 'monitor_qmp_dispatch'
> 9) success message is sent to client B *without it ever having sent
> 'qmp_capabilities' itself*
> 9a) even if client B ignores it, it will now presumably send it's own
> greeting, which will error because caps are already set
>
> The fix proposed here uses an atomic, sequential connection number
> stored in the MonitorQMP struct, which is incremented every time a new
> client connects. Since it is not changed on CHR_EVENT_CLOSED, the
> behaviour of allowing a client to disconnect only one side of the
> connection is retained.
>
> The connection_nr needs to be exposed outside of the monitor subsystem,
> since qmp_dispatch lives in qapi code. It needs to be checked twice,
> once for actually running the command in the BH (fixes 7/9a), and once for
> sending back a response (fixes 9).
>
> This satisfies my local reproducer - using multiple clients constantly
> looping to open a connection, send the greeting, then exiting no longer
> crashes other, normally behaving clients with unrelated responses.
>
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>
> This is another patch in the apparently never-ending series of fixes to
> QMP-related race conditions. Our PVE users really seem to have a knack for
> triggering these ;)
>
> Here's one of the (several) forum threads where we tried to diagnose the
> issue:
> https://forum.proxmox.com/threads/error-with-backup-when-backing-up-qmp-command-query-backup-failed-got-wrong-command-id.88017/
>
> It manifested itself under load, where sometimes our monitor code (which uses
> the 'id' system of QMP to track commands) would receive bogus responses,
> showing
> up as "got wrong command id" errors in our logs.
>
> I'm not sure this approach is the best fix, it seems a lot like a band-aid to
> me, but it's the best I could come up with for now - open for different ideas
> of
> course.
>
> Note that with this patch, we're no longer running a command that was
> scheduled
> after a client has disconnected early. This seems like a slight behaviour
> change
> to me... On the other hand, I didn't want to drag the connection number into
> qmp_capabilities() and special case that even further.
>
> I didn't look to deeply into 'QMP in iothread' yet, if that does what I think
> it
> does there might be two more potential races here:
> * between 'do_qmp_dispatch_bh' running 'aio_co_wake' and 'qmp_dispatch'
> actually
> yielding (very unlikely though)
> * plus a TOCTOU in 'do_qmp_dispatch_bh' with the atomic read of the
> connection_nr and actually running cmd->fn()
>
> Thanks!
>
>
> include/monitor/monitor.h | 1 +
> monitor/monitor-internal.h | 7 +++++++
> monitor/monitor.c | 15 +++++++++++++++
> monitor/qmp.c | 15 ++++++++++++++-
> qapi/qmp-dispatch.c | 21 +++++++++++++++++----
> stubs/monitor-core.c | 5 +++++
> 6 files changed, 59 insertions(+), 5 deletions(-)
>
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index af3887bb71..ff6db1448b 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -16,6 +16,7 @@ extern QemuOptsList qemu_mon_opts;
> Monitor *monitor_cur(void);
> Monitor *monitor_set_cur(Coroutine *co, Monitor *mon);
> bool monitor_cur_is_qmp(void);
> +int monitor_get_connection_nr(const Monitor *mon);
>
> void monitor_init_globals(void);
> void monitor_init_globals_core(void);
> diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
> index 9c3a09cb01..a92be8c3f7 100644
> --- a/monitor/monitor-internal.h
> +++ b/monitor/monitor-internal.h
> @@ -144,6 +144,13 @@ typedef struct {
> QemuMutex qmp_queue_lock;
> /* Input queue that holds all the parsed QMP requests */
> GQueue *qmp_requests;
> +
> + /*
> + * A sequential number that gets incremented on every new
> CHR_EVENT_OPENED.
> + * Used to avoid leftover responses in BHs from being sent to the wrong
> + * client. Access with atomics.
> + */
> + int connection_nr;
> } MonitorQMP;
>
> /**
> diff --git a/monitor/monitor.c b/monitor/monitor.c
> index 636bcc81c5..ee5ac26c37 100644
> --- a/monitor/monitor.c
> +++ b/monitor/monitor.c
> @@ -136,6 +136,21 @@ bool monitor_cur_is_qmp(void)
> return cur_mon && monitor_is_qmp(cur_mon);
> }
>
> +/**
> + * If @mon is a QMP monitor, return the connection_nr, otherwise -1.
> + */
> +int monitor_get_connection_nr(const Monitor *mon)
> +{
> + MonitorQMP *qmp_mon;
> +
> + if (!monitor_is_qmp(mon)) {
> + return -1;
> + }
> +
> + qmp_mon = container_of(mon, MonitorQMP, common);
> + return qatomic_read(&qmp_mon->connection_nr);
> +}
> +
> /**
> * Is @mon is using readline?
> * Note: not all HMP monitors use readline, e.g., gdbserver has a
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index 0ef7cebb78..3ec67e32d3 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -141,6 +141,8 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject
> *req)
> QDict *rsp;
> QDict *error;
>
> + int conn_nr_before = qatomic_read(&mon->connection_nr);
> +
> rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon),
> &mon->common);
>
> @@ -156,7 +158,17 @@ static void monitor_qmp_dispatch(MonitorQMP *mon,
> QObject *req)
> }
> }
>
> - monitor_qmp_respond(mon, rsp);
> + /*
> + * qmp_dispatch might have yielded and waited for a BH, in which case
> there
> + * is a chance a new client connected in the meantime - if this happened,
> + * the command will not have been executed, but we also need to ensure
> that
> + * we don't send back a corresponding response on a line that no longer
> + * belongs to this request.
> + */
> + if (conn_nr_before == qatomic_read(&mon->connection_nr)) {
> + monitor_qmp_respond(mon, rsp);
> + }
> +
> qobject_unref(rsp);
> }
>
> @@ -444,6 +456,7 @@ static void monitor_qmp_event(void *opaque, QEMUChrEvent
> event)
>
> switch (event) {
> case CHR_EVENT_OPENED:
> + qatomic_inc_fetch(&mon->connection_nr);
> mon->commands = &qmp_cap_negotiation_commands;
> monitor_qmp_caps_reset(mon);
> data = qmp_greeting(mon);
> diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
> index 59600210ce..95602446eb 100644
> --- a/qapi/qmp-dispatch.c
> +++ b/qapi/qmp-dispatch.c
> @@ -120,16 +120,28 @@ typedef struct QmpDispatchBH {
> QObject **ret;
> Error **errp;
> Coroutine *co;
> + int conn_nr;
> } QmpDispatchBH;
>
> static void do_qmp_dispatch_bh(void *opaque)
> {
> QmpDispatchBH *data = opaque;
>
> - assert(monitor_cur() == NULL);
> - monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
> - data->cmd->fn(data->args, data->ret, data->errp);
> - monitor_set_cur(qemu_coroutine_self(), NULL);
> + /*
> + * A QMP monitor tracks it's client with a connection number, if this
> + * changes during the scheduling delay of this BH, we must not execute
> the
> + * command. Otherwise a badly placed 'qmp_capabilities' might affect the
> + * connection state of a client it was never meant for.
> + */
> + if (data->conn_nr == monitor_get_connection_nr(data->cur_mon)) {
> + assert(monitor_cur() == NULL);
> + monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
> + data->cmd->fn(data->args, data->ret, data->errp);
> + monitor_set_cur(qemu_coroutine_self(), NULL);
> + } else {
> + error_setg(data->errp, "active monitor connection changed");
> + }
> +
> aio_co_wake(data->co);
> }
>
> @@ -243,6 +255,7 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject
> *request,
> .ret = &ret,
> .errp = &err,
> .co = qemu_coroutine_self(),
> + .conn_nr = monitor_get_connection_nr(cur_mon),
> };
> aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
> &data);
> diff --git a/stubs/monitor-core.c b/stubs/monitor-core.c
> index d058a2a00d..3290b58120 100644
> --- a/stubs/monitor-core.c
> +++ b/stubs/monitor-core.c
> @@ -13,6 +13,11 @@ Monitor *monitor_set_cur(Coroutine *co, Monitor *mon)
> return NULL;
> }
>
> +int monitor_get_connection_nr(const Monitor *mon)
> +{
> + return -1;
> +}
> +
> void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
> {
> }
>