qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 00/21] blockjobs: add explicit job management


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v5 00/21] blockjobs: add explicit job management
Date: Wed, 18 Apr 2018 09:25:14 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)

John Snow <address@hidden> writes:

> On 04/17/2018 09:44 AM, Markus Armbruster wrote:
>> John Snow <address@hidden> writes:
>> 
>>> This series seeks to address two distinct but closely related issues
>>> concerning the job management API.
>>>
>>> (1) For jobs that complete when a monitor is not attached and receiving
>>>     events or notifications, there's no way to discern the job's final
>>>     return code. Jobs must remain in the query list until dismissed
>>>     for reliable management.
>>>
>>> (2) Jobs that change the block graph structure at an indeterminate point
>>>     after the job starts compete with the management layer that relies
>>>     on that graph structure to issue meaningful commands.
>>>
>>>     This structure should change only at the behest of the management
>>>     API, and not asynchronously at unknown points in time. Before a job
>>>     issues such changes, it must rely on explicit and synchronous
>>>     confirmation from the management API.
>>>
>>> These changes are implemented by formalizing a State Transition Machine
>>> for the BlockJob subsystem.
>>>
>>> Job States:
>>>
>>> UNDEFINED       Default state. Internal state only.
>>> CREATED         Job has been created
>>> RUNNING         Job has been started and is running
>>> PAUSED          Job is not ready and has been paused
>>> READY           Job is ready and is running
>>> STANDBY         Job is ready and is paused
>>>
>>> WAITING         Job is waiting on peers in transaction
>>> PENDING         Job is waiting on ACK from QMP
>>> ABORTING        Job is aborting or has been cancelled
>>> CONCLUDED       Job has finished and has a retcode available
>>> NULL            Job is being dismantled. Internal state only.
>>>
>>> Job Verbs:
>>>
>
> Backporting your quote up here:
>
>> For each job verb and job state: what's the new job state?
>>
>
> That's not always 1:1, though I tried to address it in the commit messages.

Let me rephrase my question then.  For each job verb and job state: what
are the possible new job states?  If there's more than one, what's the
condition for each?

I appreciate commit messages explaining that, but having complete state
machine documentation in one place (a comment or in docs/) would be
nice, wouldn't it?

>>> CANCEL          Instructs a running job to terminate with error,
>>>                 (Except when that job is READY, which produces no error.)
>
> CANCEL will take a job to either NULL... (this is the early abort
> pathway, prior to the job being fully realized.)
>
> ...or to ABORTING (from CREATED once it has fully realized the job, or
> from RUNNING, READY, WAITING, or PENDING.)
>
>>> PAUSE           Request a job to pause.
>
> issued to RUNNING or READY, transitions to PAUSED or STANDBY respectively.
>
>>> RESUME          Request a job to resume from a pause.
>
> issued to PAUSED or STANDBY, transitions to RUNNING or READY respectively.
>
>>> SET-SPEED       Change the speed limiting parameter of a job.
>
> No run state change.
>
>>> COMPLETE        Ask a READY job to finish and exit.
>>>
>
> Issued to a READY job, transitions to WAITING.
>
>>> FINALIZE        Ask a PENDING job to perform its graph finalization.
>
> Issued to a PENDING job, transitions to CONCLUDED.
>
>>> DISMISS         Finish cleaning up an empty job.
>> 
>
> Issued to a CONCLUDED job, transitions to NULL.
>
>
>>> And here's my stab at a diagram:
>>>
>>>                  +---------+
>>>                  |UNDEFINED|
>>>                  +--+------+
>>>                     |
>>>                  +--v----+
>>>        +---------+CREATED+-----------------+
>>>        |         +--+----+                 |
>>>        |            |                      |
>>>        |         +--+----+     +------+    |
>>>        +---------+RUNNING<----->PAUSED|    |
>>>        |         +--+-+--+     +------+    |
>>>        |            | |                    |
>>>        |            | +------------------+ |
>>>        |            |                    | |
>>>        |         +--v--+       +-------+ | |
>>>        +---------+READY<------->STANDBY| | |
>>>        |         +--+--+       +-------+ | |
>>>        |            |                    | |
>>>        |         +--v----+               | |
>>>        +---------+WAITING<---------------+ |
>>>        |         +--+----+                 |
>>>        |            |                      |
>>>        |         +--v----+                 |
>>>        +---------+PENDING|                 |
>>>        |         +--+----+                 |
>>>        |            |                      |
>>>     +--v-----+   +--v------+               |
>>>     |ABORTING+--->CONCLUDED|               |
>>>     +--------+   +--+------+               |
>>>                     |                      |
>>>                  +--v-+                    |
>>>                  |NULL<--------------------+
>>>                  +----+
>> 
>> Is this diagram missing a few arrowheads?  E.g. on the edge between
>> RUNNING and WAITING.
>> 
>
> Apparently yes. :\
>
> (Secretly fixed up in my reply.)
>
>> Might push the limits of ASCII art, but here goes anyway: can we label
>> the arrows with job verbs?
>> 
>
> Can you recommend a tool to help me do that? I've been using asciiflow
> infinity (http://asciiflow.com) and it's not very good, but I don't have
> anything better.

I do my ASCII art in Emacs picture-mode.

>> Can you briefly explain how this state machine addresses (1) and (2)?
>> 
>
> (1) The CONCLUDED state allows jobs to persist in the job query list
> after they would have disappeared in 2.11-era QEMU. This lets us query
> for completion codes and to dismiss the job at our own leisure.

Got it.

> (2) The PENDING state allows jobs to wait in a nearly-completed state,
> pending authorization from the QMP client to make graph changes.
> Otherwise, the job has to asynchronously perform this cleanup and the
> exact point in time is unknowable to the QMP client. By making a PENDING
> state and a finalize callback (.prepare), we can make this portion of a
> job's task synchronous.

This provides for jobs modifying the graph on job completion.  It
doesn't provide for jobs modifying the graph while they run.  Fine with
me; we're not aware of a use for messing with the graph in the middle of
a job.

> "John, you added more than two states..."
>
> Yup, this was to help simplify the existing state machine, believe it or
> not. I modeled all jobs as transactions to eliminate different cleanup
> routing and added two new interim states;
>
> - WAITING
> - ABORTING
>
> to help make assertions about the valid transitions jobs can make. The
> ABORTING state helps make it clear when a job is allowed to fail (and
> emit QMP events related to such).
>
> The WAITING state is simply advisory to help a client know that a job is
> "finished" but cannot yet receive further instruction because of peers
> in a transaction. This helps me to add nice QMP errors for any verbs
> issued to such jobs. "Sorry pal, this job is waiting and can't hear you
> right now!"
>
> This kept the code cleaner than adding a bunch of very fragile boolean
> error-checking pathways in dozens of helper functions to help avoid
> illegal instructions on jobs not prepared to receive those instructions.
>
> So these two new states don't help accomplish (1) or (2) strictly, but
> they do facilitate the code additions that _do_ a lot less ugly.

Thanks!

Looks like a fine starting point for in-tree state machine documentation
:)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]