qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition
Date: Wed, 28 Sep 2016 15:16:33 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

Ok, let's discuss it on list.

On 27.09.2016 19:19, John Snow wrote:
Replying off-list to follow your lead, but this might be a good discussion to have upstream where Stefan and Jeff can see it.

On 09/26/2016 11:53 AM, Vladimir Sementsov-Ogievskiy wrote:
Hi John!

What about this series? If you are working on this, I can tell about
another related problem.


I got mired in trying to think about how to delete a BlockJob that hasn't technically been started yet -- this patch series is unsafe in that it will be unable to "cancel" a job that hasn't started yet.

(And I don't want to start a job just to cancel it, that's gross.)

I need to expand the job modifications here a little bit to make this completely safe. I may need to create a job->started bool that allows you to call a job_delete() function if and only if started is false.

I'll try to send this along this week.

I'm working on async scheme for backup. I've started from something like
just wrapping backup_do_cow into coroutine and use this coroutine in
full-backup-loop instead of just call backup_do_cow. This simple
approach seems better than used in mirror - in mirror we create new
coroutine for each async read and write, when I create new coroutine for
copy of = read + write. But then I decided to rewrite this to
worker-coroutines scheme. I.e. we do not create new coroutine for each
request and control their count by " while (job->inflight_requests >=
MAX_IN_FLIGHT) { yield }" but we create MAX_IN_FLIGHT worker coroutines
at backup start, and then they just copy cluster by cluster, using
common cop_bitmap to synchronize. All cluster copying is done in
backup-workers and write-notifiers only change the order in which
workers copy clusters. And I liked it all until I figured out the error
handling.


Makes sense to me so far.

I can't handle errors in workers, because I need to call
block_job_error_action, which may stop blockjob, and it seems to be a
problem if several workers calls it simultaneously. So, worker have to

What kind of problems are you seeing? Maybe we can fix them? is it for STOP cases? (or also on IGNORE/REPORT?)

Otherwise, just stash an error object in a location that the master coroutine can find, pause/terminate all worker threads, and signal the error.

defer error handling to main coroutine of the blockjob (this coroutine
in my solution doesn't do copying but only handles blockjob staff like
pauses, stops, throttling). So, we will have a queue of workers, waiting
for main coroutine to handle their errors. This looks too complicated..

Really? I don't think it's too bad. The worker detects an error and creates a record of the problem, then the worker yields.

The controller detects the worker has yielded and sees the record of the error. The controller opts not to re-schedule the worker.

The controller then instructs all remaining workers to yield. Once all workers are yielded, the controller invokes block_job_error_action.

From there, we can handle the error accordingly.

Either way it sounds like we're going to have to manage the complexity of what happens when a job with worker coroutines is paused by e.g. the QMP monitor: the controller needs to send and coordinate messages to the workers; so this doesn't sound like too much added complexity unless I'm dreaming.

And finally I come to the idea that all this problems and complications
are because blockjob is not created for asynchronous work, because
blockjob has only one working coroutine. So the true way is to maintain
several working coroutines on generic blockjob layer. This will lead to
simpler and transparent async schemes for backup and mirror (and may be
other blockjobs).

So, the question is: what about refactoring blockjobs, to move from one
working coroutine to several ones? Is it hard and is it possible? I do
not understand all block-jobs related staff..


I think jobs will need to remain "one coroutine, one job" for now, but there's no reason why drive-backup or blockdev-backup can't just create multiple jobs each if that's what they need to do. (The backup job object could, in theory, just have another job pointer to a helper job if it really became necessary.)

Let's try to solve your design problem first and see if we can't make error reporting behave nicely in a contested environment.

Hmm, ok, I'll go this way for now. But anyway, I think that finally async scheme in backup and mirror should be the same. And it should share the same code, so all worker-related should be separated from backup to own file, or injected into block-jobs.



On 08.08.2016 22:09, John Snow wrote:
There are a few problems with transactional job completion right now.

First, if jobs complete so quickly they complete before remaining jobs
get a chance to join the transaction, the completion mode can leave well
known state and the QLIST can get corrupted and the transactional jobs
can complete in batches or phases instead of all together.

Second, if two or more jobs defer to the main loop at roughly the same
time, it's possible for one job's cleanup to directly invoke the other
job's cleanup from within the same thread, leading to a situation that
will deadlock the entire transaction.

Thanks to Vladimir for pointing out these modes of failure.

________________________________________________________________________________


For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git  branch job-manual-start
https://github.com/jnsnow/qemu/tree/job-manual-start

This version is tagged job-manual-start-v1:
https://github.com/jnsnow/qemu/releases/tag/job-manual-start-v1

John Snow (4):
   blockjob: add block_job_start
   blockjob: refactor backup_start as backup_job_create
   blockjob: add .clean property
   iotests: add transactional failure race test

Vladimir Sementsov-Ogievskiy (1):
   blockjob: fix dead pointer in txn list

  block/backup.c             |  50 +++++++-----
  block/commit.c             |   2 +-
  block/mirror.c             |   2 +-
  block/stream.c             |   2 +-
  blockdev.c                 | 194
++++++++++++++++++++++++++-------------------
  blockjob.c                 |  24 +++++-
  include/block/block_int.h  |  19 ++---
  include/block/blockjob.h   |  16 ++++
  tests/qemu-iotests/124     |  91 +++++++++++++++++++++
  tests/qemu-iotests/124.out |   4 +-
  tests/test-blockjob-txn.c  |   2 +-
  11 files changed, 284 insertions(+), 122 deletions(-)





--
Best regards,
Vladimir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]