Re: [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition

From:	Vladimir Sementsov-Ogievskiy
Subject:	Re: [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition
Date:	Wed, 28 Sep 2016 15:16:33 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0

Ok, let's discuss it on list.

On 27.09.2016 19:19, John Snow wrote:

Replying off-list to follow your lead, but this might be a gooddiscussion to have upstream where Stefan and Jeff can see it.
On 09/26/2016 11:53 AM, Vladimir Sementsov-Ogievskiy wrote:
Hi John!

What about this series? If you are working on this, I can tell about
another related problem.
I got mired in trying to think about how to delete a BlockJob thathasn't technically been started yet -- this patch series is unsafe inthat it will be unable to "cancel" a job that hasn't started yet.
(And I don't want to start a job just to cancel it, that's gross.)
I need to expand the job modifications here a little bit to make thiscompletely safe. I may need to create a job->started bool that allowsyou to call a job_delete() function if and only if started is false.
I'll try to send this along this week.
I'm working on async scheme for backup. I've started from something like
just wrapping backup_do_cow into coroutine and use this coroutine in
full-backup-loop instead of just call backup_do_cow. This simple
approach seems better than used in mirror - in mirror we create new
coroutine for each async read and write, when I create new coroutine for
copy of = read + write. But then I decided to rewrite this to
worker-coroutines scheme. I.e. we do not create new coroutine for each
request and control their count by " while (job->inflight_requests >=
MAX_IN_FLIGHT) { yield }" but we create MAX_IN_FLIGHT worker coroutines
at backup start, and then they just copy cluster by cluster, using
common cop_bitmap to synchronize. All cluster copying is done in
backup-workers and write-notifiers only change the order in which
workers copy clusters. And I liked it all until I figured out the error
handling.
Makes sense to me so far.
I can't handle errors in workers, because I need to call
block_job_error_action, which may stop blockjob, and it seems to be a
problem if several workers calls it simultaneously. So, worker have to
What kind of problems are you seeing? Maybe we can fix them? is it forSTOP cases? (or also on IGNORE/REPORT?)
Otherwise, just stash an error object in a location that the mastercoroutine can find, pause/terminate all worker threads, and signal theerror.
defer error handling to main coroutine of the blockjob (this coroutine
in my solution doesn't do copying but only handles blockjob staff like
pauses, stops, throttling). So, we will have a queue of workers, waiting
for main coroutine to handle their errors. This looks too complicated..
Really? I don't think it's too bad. The worker detects an error andcreates a record of the problem, then the worker yields.
The controller detects the worker has yielded and sees the record ofthe error. The controller opts not to re-schedule the worker.
The controller then instructs all remaining workers to yield. Once allworkers are yielded, the controller invokes block_job_error_action.
From there, we can handle the error accordingly.
Either way it sounds like we're going to have to manage the complexityof what happens when a job with worker coroutines is paused by e.g.the QMP monitor: the controller needs to send and coordinate messagesto the workers; so this doesn't sound like too much added complexityunless I'm dreaming.
And finally I come to the idea that all this problems and complications
are because blockjob is not created for asynchronous work, because
blockjob has only one working coroutine. So the true way is to maintain
several working coroutines on generic blockjob layer. This will lead to
simpler and transparent async schemes for backup and mirror (and may be
other blockjobs).

So, the question is: what about refactoring blockjobs, to move from one
working coroutine to several ones? Is it hard and is it possible? I do
not understand all block-jobs related staff..
I think jobs will need to remain "one coroutine, one job" for now, butthere's no reason why drive-backup or blockdev-backup can't justcreate multiple jobs each if that's what they need to do. (The backupjob object could, in theory, just have another job pointer to a helperjob if it really became necessary.)
Let's try to solve your design problem first and see if we can't makeerror reporting behave nicely in a contested environment.

Hmm, ok, I'll go this way for now. But anyway, I think that finallyasync scheme in backup and mirror should be the same. And it shouldshare the same code, so all worker-related should be separated frombackup to own file, or injected into block-jobs.


On 08.08.2016 22:09, John Snow wrote:

There are a few problems with transactional job completion right now.

First, if jobs complete so quickly they complete before remaining jobs

get a chance to join the transaction, the completion mode can leavewell

known state and the QLIST can get corrupted and the transactional jobs
can complete in batches or phases instead of all together.

Second, if two or more jobs defer to the main loop at roughly the same
time, it's possible for one job's cleanup to directly invoke the other
job's cleanup from within the same thread, leading to a situation that
will deadlock the entire transaction.

Thanks to Vladimir for pointing out these modes of failure.

________________________________________________________________________________



For convenience, this branch is available at:
https://github.com/jnsnow/qemu.git  branch job-manual-start
https://github.com/jnsnow/qemu/tree/job-manual-start

This version is tagged job-manual-start-v1:
https://github.com/jnsnow/qemu/releases/tag/job-manual-start-v1

John Snow (4):
   blockjob: add block_job_start
   blockjob: refactor backup_start as backup_job_create
   blockjob: add .clean property
   iotests: add transactional failure race test

Vladimir Sementsov-Ogievskiy (1):
   blockjob: fix dead pointer in txn list

  block/backup.c             |  50 +++++++-----
  block/commit.c             |   2 +-
  block/mirror.c             |   2 +-
  block/stream.c             |   2 +-
  blockdev.c                 | 194
++++++++++++++++++++++++++-------------------
  blockjob.c                 |  24 +++++-
  include/block/block_int.h  |  19 ++---
  include/block/blockjob.h   |  16 ++++
  tests/qemu-iotests/124     |  91 +++++++++++++++++++++
  tests/qemu-iotests/124.out |   4 +-
  tests/test-blockjob-txn.c  |   2 +-
  11 files changed, 284 insertions(+), 122 deletions(-)



--
Best regards,
Vladimir

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition, Vladimir Sementsov-Ogievskiy <=
- Re: [Qemu-devel] [Qemu-block] [PATCH 0/5] blockjobs: Fix transactional race condition, Kevin Wolf, 2016/09/29
  - Re: [Qemu-devel] [Qemu-block] [PATCH 0/5] blockjobs: Fix transactional race condition, Stefan Hajnoczi, 2016/09/29
  - Re: [Qemu-devel] [Qemu-block] [PATCH 0/5] blockjobs: Fix transactional race condition, John Snow, 2016/09/29

Prev by Date: Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support
Next by Date: Re: [Qemu-devel] [PATCH 2/3] io: Use qio_channel_has_feature() where applicable
Previous by thread: [Qemu-devel] [PATCH 0/3] io: Various fixes around QIOChannel Features
Next by thread: Re: [Qemu-devel] [Qemu-block] [PATCH 0/5] blockjobs: Fix transactional race condition
Index(es):
- Date
- Thread