On 08.04.21 19:26, Vladimir Sementsov-Ogievskiy wrote:
08.04.2021 20:04, John Snow wrote:
On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:
job-complete command is async. Can we instead just add a boolean
like job->completion_requested, and set it if job-complete called
in STANDBY state, and on job_resume job_complete will be called
automatically if this boolean is true?
job_complete has a synchronous setup, though -- we lose out on a lot
of synchronous error checking in that circumstance.
yes, that's a problem..
I was not able to audit it to determine that it'd be safe to attempt
that setup during a drained section -- I imagine it won't work and
will fail, though.
So I thought we'd have to signal completion and run the setup
*later*, but what do we do if we get an error then? Does the entire
job fail? Do we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED"
?) Is it recoverable?
Isn't it possible even now, that after successful job-complete job
still fails and we report BLOCK_JOB_COMPLETED with error?
And actually, how much benefit user get from the fact that
job-complete may fail?
We can make job-complete a simple always-success boolean flag setter
like job-pause.
I wanted to say the following:
But job-pause does always succeed, in contrast to block-job-complete.
block-job-complete is more akin to job-finalize, which too is a
synchronous operation.
But when I wrote that last sentence, I asked myself whether what
mirror_complete() does isn’t actually a remnant of what we had to do
when we didn’t have job-finalize yet. Shouldn’t that all be in
mirror_exit_common()? What’s the advantage of opening the backing
chain or putting blockers on the to-replace node in
block-job-complete? Aren’t that all graph-changing operation,
basically, i.e. stuff that should be done in job-finalize?
If we move everything to mirror_exit_common(), all that remains to do
is basically set some should_complete flag (could even be part of the
Job struct), and then the whole problem disappears.
Thoughts?