[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-devel] [PATCH v5 00/21] blockjobs: add explicit j
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-block] [Qemu-devel] [PATCH v5 00/21] blockjobs: add explicit job management |
Date: |
Wed, 18 Apr 2018 19:42:12 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) |
John Snow <address@hidden> writes:
> On 04/18/2018 03:25 AM, Markus Armbruster wrote:
>> John Snow <address@hidden> writes:
>>
>>> On 04/17/2018 09:44 AM, Markus Armbruster wrote:
>>>> John Snow <address@hidden> writes:
>>>>
>>>>> This series seeks to address two distinct but closely related issues
>>>>> concerning the job management API.
>>>>>
>>>>> (1) For jobs that complete when a monitor is not attached and receiving
>>>>> events or notifications, there's no way to discern the job's final
>>>>> return code. Jobs must remain in the query list until dismissed
>>>>> for reliable management.
>>>>>
>>>>> (2) Jobs that change the block graph structure at an indeterminate point
>>>>> after the job starts compete with the management layer that relies
>>>>> on that graph structure to issue meaningful commands.
>>>>>
>>>>> This structure should change only at the behest of the management
>>>>> API, and not asynchronously at unknown points in time. Before a job
>>>>> issues such changes, it must rely on explicit and synchronous
>>>>> confirmation from the management API.
>>>>>
>>>>> These changes are implemented by formalizing a State Transition Machine
>>>>> for the BlockJob subsystem.
>>>>>
>>>>> Job States:
>>>>>
>>>>> UNDEFINED Default state. Internal state only.
>>>>> CREATED Job has been created
>>>>> RUNNING Job has been started and is running
>>>>> PAUSED Job is not ready and has been paused
>>>>> READY Job is ready and is running
>>>>> STANDBY Job is ready and is paused
>>>>>
>>>>> WAITING Job is waiting on peers in transaction
>>>>> PENDING Job is waiting on ACK from QMP
>>>>> ABORTING Job is aborting or has been cancelled
>>>>> CONCLUDED Job has finished and has a retcode available
>>>>> NULL Job is being dismantled. Internal state only.
>>>>>
>>>>> Job Verbs:
>>>>>
>>>
>>> Backporting your quote up here:
>>>
>>>> For each job verb and job state: what's the new job state?
>>>>
>>>
>>> That's not always 1:1, though I tried to address it in the commit messages.
>>
>> Let me rephrase my question then. For each job verb and job state: what
>> are the possible new job states? If there's more than one, what's the
>> condition for each?
>>
>
> Is my answer below not sufficient? Maybe you're asking "Can you write
> this up in a formal document" instead, or did I miss explaining something?
Your answer was fine. I blame my one-pass reply writing.
Nevertheless:
>> I appreciate commit messages explaining that, but having complete state
>> machine documentation in one place (a comment or in docs/) would be
>> nice, wouldn't it?
Pretty-please?
>>>>> CANCEL Instructs a running job to terminate with error,
>>>>> (Except when that job is READY, which produces no error.)
>>>
>>> CANCEL will take a job to either NULL... (this is the early abort
>>> pathway, prior to the job being fully realized.)
>>>
>>> ...or to ABORTING (from CREATED once it has fully realized the job, or
>>> from RUNNING, READY, WAITING, or PENDING.)
>>>
>>>>> PAUSE Request a job to pause.
>>>
>>> issued to RUNNING or READY, transitions to PAUSED or STANDBY respectively.
>>>
>>>>> RESUME Request a job to resume from a pause.
>>>
>>> issued to PAUSED or STANDBY, transitions to RUNNING or READY respectively.
>>>
>>>>> SET-SPEED Change the speed limiting parameter of a job.
>>>
>>> No run state change.
>>>
>>>>> COMPLETE Ask a READY job to finish and exit.
>>>>>
>>>
>>> Issued to a READY job, transitions to WAITING.
>>>
>>>>> FINALIZE Ask a PENDING job to perform its graph finalization.
>>>
>>> Issued to a PENDING job, transitions to CONCLUDED.
>>>
>>>>> DISMISS Finish cleaning up an empty job.
>>>>
>>>
>>> Issued to a CONCLUDED job, transitions to NULL.
>>>
>>>
>>>>> And here's my stab at a diagram:
>>>>>
>>>>> +---------+
>>>>> |UNDEFINED|
>>>>> +--+------+
>>>>> |
>>>>> +--v----+
>>>>> +---------+CREATED+-----------------+
>>>>> | +--+----+ |
>>>>> | | |
>>>>> | +--+----+ +------+ |
>>>>> +---------+RUNNING<----->PAUSED| |
>>>>> | +--+-+--+ +------+ |
>>>>> | | | |
>>>>> | | +------------------+ |
>>>>> | | | |
>>>>> | +--v--+ +-------+ | |
>>>>> +---------+READY<------->STANDBY| | |
>>>>> | +--+--+ +-------+ | |
>>>>> | | | |
>>>>> | +--v----+ | |
>>>>> +---------+WAITING<---------------+ |
>>>>> | +--+----+ |
>>>>> | | |
>>>>> | +--v----+ |
>>>>> +---------+PENDING| |
>>>>> | +--+----+ |
>>>>> | | |
>>>>> +--v-----+ +--v------+ |
>>>>> |ABORTING+--->CONCLUDED| |
>>>>> +--------+ +--+------+ |
>>>>> | |
>>>>> +--v-+ |
>>>>> |NULL<--------------------+
>>>>> +----+
>>>>
>>>> Is this diagram missing a few arrowheads? E.g. on the edge between
>>>> RUNNING and WAITING.
>>>>
>>>
>>> Apparently yes. :\
>>>
>>> (Secretly fixed up in my reply.)
>>>
>>>> Might push the limits of ASCII art, but here goes anyway: can we label
>>>> the arrows with job verbs?
>>>>
>>>
>>> Can you recommend a tool to help me do that? I've been using asciiflow
>>> infinity (http://asciiflow.com) and it's not very good, but I don't have
>>> anything better.
>>
>> I do my ASCII art in Emacs picture-mode.
>>
>>>> Can you briefly explain how this state machine addresses (1) and (2)?
>>>>
>>>
>>> (1) The CONCLUDED state allows jobs to persist in the job query list
>>> after they would have disappeared in 2.11-era QEMU. This lets us query
>>> for completion codes and to dismiss the job at our own leisure.
>>
>> Got it.
>>
>>> (2) The PENDING state allows jobs to wait in a nearly-completed state,
>>> pending authorization from the QMP client to make graph changes.
>>> Otherwise, the job has to asynchronously perform this cleanup and the
>>> exact point in time is unknowable to the QMP client. By making a PENDING
>>> state and a finalize callback (.prepare), we can make this portion of a
>>> job's task synchronous.
>>
>> This provides for jobs modifying the graph on job completion. It
>> doesn't provide for jobs modifying the graph while they run. Fine with
>> me; we're not aware of a use for messing with the graph in the middle of
>> a job.
>>
>
> I didn't consider this possibility. The concept could in theory be
> expanded to arbitrary sync points, but I'm not going to worry about that
> until the need arises.
Makes sense.
>>> "John, you added more than two states..."
>>>
>>> Yup, this was to help simplify the existing state machine, believe it or
>>> not. I modeled all jobs as transactions to eliminate different cleanup
>>> routing and added two new interim states;
>>>
>>> - WAITING
>>> - ABORTING
>>>
>>> to help make assertions about the valid transitions jobs can make. The
>>> ABORTING state helps make it clear when a job is allowed to fail (and
>>> emit QMP events related to such).
>>>
>>> The WAITING state is simply advisory to help a client know that a job is
>>> "finished" but cannot yet receive further instruction because of peers
>>> in a transaction. This helps me to add nice QMP errors for any verbs
>>> issued to such jobs. "Sorry pal, this job is waiting and can't hear you
>>> right now!"
>>>
>>> This kept the code cleaner than adding a bunch of very fragile boolean
>>> error-checking pathways in dozens of helper functions to help avoid
>>> illegal instructions on jobs not prepared to receive those instructions.
>>>
>>> So these two new states don't help accomplish (1) or (2) strictly, but
>>> they do facilitate the code additions that _do_ a lot less ugly.
>>
>
> I really bungled that sentence.
No problem, I got it anyway :)
>> Thanks!
>>
>> Looks like a fine starting point for in-tree state machine documentation
>> :)