[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] monitor/qmp: fix race with clients disconnecting early
From: |
Markus Armbruster |
Subject: |
Re: [PATCH] monitor/qmp: fix race with clients disconnecting early |
Date: |
Thu, 26 Aug 2021 15:50:21 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Markus Armbruster <armbru@redhat.com> writes:
[...]
> Let me re-explain the bug in my own words, to make sure I understand.
>
> A QMP monitor normally runs in the monitor I/O thread.
>
> A QMP monitor can serve only one client at a time.
>
> It executes out-of-band commands right as it reads them. In-band
> commands are queued, and executed one after the other in the main loop.
>
> Command output is buffered. We write it out as fast as the character
> device can take it. If a write fails, we throw away the entire buffer
> contents.
>
> A client can disconnect at any time. This throws away the queue. An
> in-band command may be executing in the main loop. An out-of-band
> command may be executing in the monitor's thread.
>
> Such commands (if any) are not affected by the disconnect. Their output
> gets buffered, but write out fails, so it's thrown away.
>
> *Except* when another client connects quickly enough. Then we send it
> output meant for the previous client. This is wrong. I suspect this
> could even send invalid JSON.
>
> Special case: if in-band command qmp_capabilities is executing when the
> client disconnects, and another client connects before the command flips
> the monitor from capabilities negotiation mode to command mode, that
> client starts in the wrong mode.
What the cases have in common: disconnect + connect in monitor I/O
thread and the command executing in the main thread change the same
monitor state.
You observed two issues: one involving the output buffer (new client
receives old client's output), and one involving monitor mode (new
client has its mode flipped by the old client's qmp_capabilities
command).
Any monitor state accessed by commands could cause issues. Right now, I
can see only one more: file descriptors. Cleaning them up on disconnect
could mess with the command.
[...]