Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread

From:	Markus Armbruster
Subject:	Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Date:	Thu, 07 Sep 2017 19:41:29 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
"Daniel P. Berrange" <address@hidden> writes:

> On Thu, Sep 07, 2017 at 02:59:28PM +0200, Markus Armbruster wrote:
>> So, what exactly is going to drain the command queue?  If there's more
>> than one consumer, how exactly are commands from the queue dispatched to
>> the consumers?
>
> In terms of my proposal, for any single command there should only ever
> be a single consumer. The default consumer would be the main event loop
> thread, such that we have no semantic change to QMP operation from today.
>
> Some commands that are capable of being made "async", would have a
> different consumer. For example, if the client requested the 'migrate-cancel'
> be made async, this would change things such that the migration thread is
> now responsible for consuming the "migrate-cancel" command, instead of the
> default main loop.
>
>> What are the "no hang" guarantees (if any) and conditions for each of
>> these consumers?
>
> The non-main thread consumers would have to have some reasonable
> guarantee that they won't block on a lock held by the main loop,
> otherwise the whole feature is largely useless.

Same if they block indefinitely on anything else, actually.  In other
words, we need to talk about liveness.

Threads by themselves don't buy us liveness.  Being careful with
operations that may block does.  That care may lead to farming out
certain operations to other threads, where they may block without harm.

You only talk about "the non-main thread consumers".  What about the
main thread?  Is it okay for the main thread to block?  If yes, why?

>> We can have any number of QMP monitors today.  Would each of them feed
>> its own queue?  Would they all feed a shared queue?
>
> Currently with multiple QMP monitors, everything runs in the main
> loop, so commands arriving across  multiple monitors are 100%
> serialized and processed strictly in the order in which QEMU reads
> them off the wire.  To maintain these semantics, we would need to
> have a single shared queue for the default main loop consumer, so
> that ordering does not change.
>
>> How exactly is opt-in asynchronous to work?  Per QMP monitor?  Per
>> command?
>
> Per monitor+command. ie just because libvirt knows how to cope with
> async execution on the monitor it has open, does not mean that a
> different app on the 2nd monitor command can cope. So in my proposal
> the switch to async must be scoped to the particular command only
> for the monitor connection that requesteed it.
>
>> What does it mean when an asynchronous command follows a synchronous
>> command in the same QMP monitor?  I would expect the synchronous command
>> to complete before the asynchronous command, because that's what
>> synchronous means, isn't it?  To keep your QMP monitor available, you
>> then must not send synchronous commands that can hang.
>
> No, that is not what I described. All synchronous commands are
> serialized wrt each other, just as today. An asychronous command
> can run as soon as it is received, regardless of whether any
> earlier sent sync commands are still executing or pending. This
> is trivial to achieve when you separate monitor I/O from command
> execution in separate threads, provided of course the async
> command consumers are not in the main loop.

So, a synchronous command is synchronous with respect to other commands,
except for certain non-blocking commands.  The distinctive feature of
the latter isn't so much an asynchronous reply, but out-of-band
dispatch.

Out-of-band dispatch of commands that cannot block in fact orthogonal to
asynchronous replies.  I can't see why out-of-band dispatch of
synchronous non-blocking commands wouldn't work, too.

>> How can we determine whether a certain synchronous command can hang?
>> Note that with opt-in async, *all* commands are also synchronous
>> commands.
>> 
>> In short, explain to me how exactly you plan to ensure that certain QMP
>> commands (such as post-copy recovery) can always "get through", in the
>> presence of multiple monitors, hanging main loop, hanging synchronous
>> commands, hanging whatever-else-can-now-hang-in-this-post-copy-world.
>
> Taking migrate-cancel as the example. The migration code already has
> a background thread doing work independantly onthe main loop. Upon
> marking the migrate-cancel command as async, the migration control
> thread would become the consumer of migrate-cancel.

>From 30,000 feet, the QMP monitor sends a "cancel" message to the
migration thread, and later receives a "canceled" message from the
migration thread.

>From 300 feet, we use the migrate-cancel QMP command as the cancel
message, and its success response as the "canceled" message.

In other words, we're pressing the external QM-Protocol into service as
internal message passing protocol.

>                                                     This allows the
> migration operation to be cancelled immediately, regardless of whether
> there are earlier monitor commands blocked in the main loop.

The necessary part is moving all operations that can block out of
whatever loop runs the monitor, be it the main loop, some other event
loop, or a dedicated monitor thread's monitor loop.

Moving out non-blocking operations isn't necessary.  migrate-cancel
could communicate with the migration thread by any suitable mechanism or
protocol.  It doesn't have to be QMP.  Why would we want it to be QMP?

> Of course this assumes the migration control thread can't block
> for locks held by the main thread.

Thanks for your answers, they help.

>> Now let's talk about QMP requirements.
>> 
>> Any addition to QMP must consider what exists already.
>> 
>> You may add more of the same.
>> 
>> You may generalize existing stuff.
>> 
>> You may change existing stuff if you have sufficient reason, subject to
>> backward compatibility constraints.
>> 
>> But attempts to add new ways to do the same old stuff without properly
>> integrating the existing ways are not going to fly.
>> 
>> In particular, any new way to start some job, monitor and control it
>> while it lives, get notified about its state changes and so forth must
>> integrate the existing ways.  These include block jobs (probably the
>> most sophisticated of the lot), migration, dump-guest-memory, and
>> possibly more.  They all work the same way: synchronous command to kick
>> off the job, more synchronous commands to monitor and control, events to
>> notify.  They do differ in detail.
>> 
>> Asynchronous commands are a new way to do this.  When you only need to
>> be notified on "done", and don't need to monitor / control, they fit the
>> bill quite neatly.
>> 
>> However, we can't just ignore the cases where we need more than that!
>> For those, we want a single generic solution instead of the several ad
>> hoc solutions we have now.
>> 
>> If we add asynchronous commands *now*, and for simple cases only, we add
>> yet another special case for a future generic solution to integrate.
>> I'm not going to let that happen.
>
> With the async commands suggestion, while it would initially not
> provide a way to query incremental status, that could easily be
> fitted in.

This is [*] below.

>             Because command replies from async commands may be
> out-of-order wrt the original requests, clients would need to
> provide a unique ID for each command run. This originally was
> part of QMP spec but then dropped, but libvirt still actually
> generates a uniqe ID for every QMP command.
>
> Given this, one option is to actually use the QMP command ID as
> a job ID, and let you query ongoing status via some new QMP
> command that accepts the ID of the job to be queried. A complexity
> with this is how to make the jobs visible across multiple QMP
> monitors. The job ID might actually have to be a combination of
> the serial ID from the QMP command, and the ID of the monitor
> chardev combined.

Yes.  The job ID must be unique across all QMP monitors to make
broadcast notifications work.

>> I figure the closest to a generic solution we have is block jobs.
>> Perhaps a generic solution could be had by abstracting away the "block"
>> from "block jobs", leaving just "jobs".

[*] starts here:

>> Another approach is generalizing the asynchronous command proposal to
>> fully cover the not-so-simple cases.

We know asynchronous commands "fully cover" when we can use them to
replace all the existing job-like commands.

Until then, they enlarge rather than solve our jobs problem.

I get the need for an available monitor.  But I need to balance it with
other needs.  Can we find a solution for our monitor availability
problem that doesn't enlarge our jobs problem?

>> If you'd rather want to make progress on monitor availability without
>> cracking the "jobs" problem, you're in luck!  Use your license to "add
>> more of the same": synchronous command to start a job, query to monitor,
>> event to notify.  
>> 
>> If you insist on tying your monitor availability solution to
>> asynchronous commands, then I'm in luck!  I just found volunteers to
>> solve the "jobs" problem for me.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread, (continued)
Prev by Date: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Next by Date: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Previous by thread: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Next by thread: Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread
Index(es):
- Date
- Thread