qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH v2 0/4] blockjobs: add explicit job reaping
Date: Wed, 4 Oct 2017 20:27:32 +0200
User-agent: Mutt/1.9.0 (2017-09-02)

Am 04.10.2017 um 03:52 hat John Snow geschrieben:
> For jobs that complete when a monitor isn't looking, there's no way to
> tell what the job's final return code was. We need to allow jobs to
> remain in the list until queried for reliable management.

Just a short summary of what I discussed with John on IRC:

Another important reason why we want to have an explicit end of block
jobs is that job completion often makes changes to the graph. For a
management tool that manages the block graph on a node level, it is a
big problem if graph changes can happen at any point that can lead to
bad race conditions. Giving the management tool control over the end of
the block job makes it aware that graph changes happen.

This means that compared to this RFC series, we need to move the waiting
earlier in the process:

1. Block job is done and calls block_job_completed()
2. Wait for other block jobs in the same job transaction to complete
3. Send a (new) QMP event to the management tool to notify it that the
   job is ready to be reaped
4. On block-job-reap, call the .commit/.abort callbacks of the jobs in
   the transaction. They will do most of the work that is currently done
   in the completion callbacks, in particular any graph changes. If we
   need to allow error cases, we can add a .prepare_completion callback
   that can still let the whole transaction fail.
5. Send the final QMP completion event for each job in the transaction
   with the final error code. This is the existing BLOCK_JOB_COMPLETED
   or BLOCK_JOB_CANCELLED event.

The current RFC still executes .commit/.abort before block-job-reap, so
the graph changes happen too early to be under control of the management
tool.

> RFC:
> The next version will add tests for transactions.
> Kevin, can you please take a look at bdrv_is_root_node and how it is
> used with respect to do_drive_backup? I suspect that in this case that
> "is root" should actually be "true", but a node in use by a job has
> two roles; child_root and child_job, so it starts returning false here.
> 
> That's fine because we prevent a collision that way, but it makes the
> error messages pretty bad and misleading. Do you have a quick suggestion?
> (Should I just amend the loop to allow non-root nodes as long as they
> happen to be jobs so that the job creation code can check permissions?)

We kind of sidestepped this problem by deciding that there is no real
reason for the backup job to require the source to be a root node.

I'm not completely sure how we could easily get a better message while
still requiring a root node (and I suppose there are other places that
rightfully still require a root node). Ignoring children with child_job
feels ugly, but might be the best option.

Note that this will not make the conflicting command work suddenly,
because every node that has a child_job parent also has op blockers for
everything, but the error message should be less confusing than "is not
a root node".

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]