[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: change the semantic
From: |
Liang Li |
Subject: |
Re: [Qemu-block] [Qemu-devel] [PATCH] block/mirror: change the semantic of 'force' of block-job-cancel |
Date: |
Tue, 6 Feb 2018 11:11:49 +0800 |
User-agent: |
Mutt/1.7.2 (2016-11-26) |
On Mon, Feb 05, 2018 at 02:28:55PM -0500, John Snow wrote:
>
>
> On 01/31/2018 09:19 PM, Liang Li wrote:
> > On Tue, Jan 30, 2018 at 03:18:31PM -0500, John Snow wrote:
> >>
> >>
> >> On 01/30/2018 03:38 AM, Liang Li wrote:
> >>> When doing drive mirror to a low speed shared storage, if there was heavy
> >>> BLK IO write workload in VM after the 'ready' event, drive mirror block
> >>> job
> >>> can't be canceled immediately, it would keep running until the heavy BLK
> >>> IO
> >>> workload stopped in the VM.
> >>>
> >>> Because libvirt depends on block-job-cancel for block live migration, the
> >>> current block-job-cancel has the semantic to make sure data is in sync
> >>> after
> >>> the 'ready' event. This semantic can't meet some requirement, for
> >>> example,
> >>> people may use drive mirror for realtime backup while need the ability of
> >>> block live migration. If drive mirror can't not be cancelled immediately,
> >>> it means block live migration need to wait, because libvirt make use drive
> >>> mirror to implement block live migration and only one drive mirror block
> >>> job is allowed at the same time for a give block dev.
> >>>
> >>> We need a new interface for 'force cancel', which could quit block job
> >>> immediately if don't care about whether data is in sync or not.
> >>>
> >>> 'force' is not used by libvirt currently, to make things simple, change
> >>> it's semantic slightly, hope it will not break some use case which need
> >>> its
> >>> original semantic.
> >>>
> >>> Cc: Paolo Bonzini <address@hidden>
> >>> Cc: Jeff Cody <address@hidden>
> >>> Cc: Kevin Wolf <address@hidden>
> >>> Cc: Max Reitz <address@hidden>
> >>> Cc: Eric Blake <address@hidden>
> >>> Cc: John Snow <address@hidden>
> >>> Reported-by: Huaitong Han <address@hidden>
> >>> Signed-off-by: Huaitong Han <address@hidden>
> >>> Signed-off-by: Liang Li <address@hidden>
> >>> ---
> >>> block/mirror.c | 9 +++------
> >>> blockdev.c | 4 ++--
> >>> blockjob.c | 11 ++++++-----
> >>> hmp-commands.hx | 3 ++-
> >>> include/block/blockjob.h | 9 ++++++++-
> >>> qapi/block-core.json | 6 ++++--
> >>> tests/test-blockjob-txn.c | 8 ++++----
> >>> 7 files changed, 29 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/block/mirror.c b/block/mirror.c
> >>> index c9badc1..c22dff9 100644
> >>> --- a/block/mirror.c
> >>> +++ b/block/mirror.c
> >>> @@ -869,11 +869,8 @@ static void coroutine_fn mirror_run(void *opaque)
> >>>
> >>> ret = 0;
> >>> trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
> >>> - if (!s->synced) {
> >>> - block_job_sleep_ns(&s->common, delay_ns);
> >>> - if (block_job_is_cancelled(&s->common)) {
> >>> - break;
> >>> - }
> >>> + if (block_job_is_cancelled(&s->common) && s->common.force) {
> >>> + break;
> >>
> >> what's the justification for removing the sleep in the case that
> >> !s->synced && !block_job_is_cancelled(...) ?
> >>
> > if !block_job_is_cancelled() satisfied, the code in 'if (!should_complete)
> > {}'
> > will execute, there is a block_job_sleep_ns there.
> >
> > block_job_sleep_ns is for rate throttling, if there is no more data to
> > sync,
> > sleep is not needed, right?
> >
> >>> } else if (!should_complete) {
> >>> delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
> >>> block_job_sleep_ns(&s->common, delay_ns);
> >>> @@ -887,7 +884,7 @@ immediate_exit:
> >>> * or it was cancelled prematurely so that we do not guarantee
> >>> that
> >>> * the target is a copy of the source.
> >>> */
> >>> - assert(ret < 0 || (!s->synced &&
> >>> block_job_is_cancelled(&s->common)));
> >>> + assert(ret < 0 || block_job_is_cancelled(&s->common));
> >>
> >> This assertion gets weaker in the case where force isn't provided, is
> >> that desired?
> >>
> > yes. if force quit is used, the following condition can be true
> >
> > (ret >= 0) && (s->synced) && (block_job_is_cancelled(&s->common))
> >
> > so the above assert should be changed, or it will be failed.
> >
>
> What I mean is that when we *don't* use force quit, the assertion here
> is weakened. You can work the force conditional in here:
>
> ret < 0 || (block_job_is_cancelled(job) && (job->force || !s->synced))
>
> which preserves the stricter assertion when force isn't provided.
>
you are right, will change.
> >>> assert(need_drain);
> >>> mirror_wait_for_all_io(s);
> >>> }
> >>> diff --git a/blockdev.c b/blockdev.c
> >>> index 8e977ee..039f156 100644
> >>> --- a/blockdev.c
> >>> +++ b/blockdev.c
> >>> @@ -145,7 +145,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
> >>> aio_context_acquire(aio_context);
> >>>
> >>> if (bs->job) {
> >>> - block_job_cancel(bs->job);
> >>> + block_job_cancel(bs->job, false);
> >>> }
> >>>
> >>> aio_context_release(aio_context);
> >>> @@ -3802,7 +3802,7 @@ void qmp_block_job_cancel(const char *device,
> >>> }
> >>>
> >>> trace_qmp_block_job_cancel(job);
> >>> - block_job_cancel(job);
> >>> + block_job_cancel(job, force);
> >>> out:
> >>> aio_context_release(aio_context);
> >>> }
> >>> diff --git a/blockjob.c b/blockjob.c
> >>> index f5cea84..0aacb50 100644
> >>> --- a/blockjob.c
> >>> +++ b/blockjob.c
> >>> @@ -365,7 +365,7 @@ static void block_job_completed_single(BlockJob *job)
> >>> block_job_unref(job);
> >>> }
> >>>
> >>> -static void block_job_cancel_async(BlockJob *job)
> >>> +static void block_job_cancel_async(BlockJob *job, bool force)
> >>> {
> >>> if (job->iostatus != BLOCK_DEVICE_IO_STATUS_OK) {
> >>> block_job_iostatus_reset(job);
> >>> @@ -376,6 +376,7 @@ static void block_job_cancel_async(BlockJob *job)
> >>> job->pause_count--;
> >>> }
> >>> job->cancelled = true;
> >>> + job->force = force;
> >>> }
> >>>
> >>
> >> I suppose this is okay; we're setting a permanent property of the job as
> >> part of a limited operation.
> >>
> >> Granted, the job should disappear afterwards, so it should never
> >> conflict, but it made me wonder for a moment.
> >>
> >> What happens if we cancel with force = true and then immediately cancel
> >> again with force = false? What happens? Can it cause issues?
> >>
> >
> > Indeed. It can be fixed by:
> >
> > if (!job->force)
> > job->force = force
> >
> > it's that ok ?
> >
>
> I think so, yes. or job->force |= force, or your preferred equivalent
> for only allowing false-->true here.
>
> >>> static int block_job_finish_sync(BlockJob *job,
> >>> @@ -437,7 +438,7 @@ static void block_job_completed_txn_abort(BlockJob
> >>> *job)
> >>> * on the caller, so leave it. */
> >>> QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
> >>> if (other_job != job) {
> >>> - block_job_cancel_async(other_job);
> >>> + block_job_cancel_async(other_job, true);
> >>
> >> What's the rationale for deciding that transactional cancellations are
> >> always force=true?
> >>
> >
> > no particular reason, just try to speed up the abort process. :)
> >
>
> I'd prefer we not change the semantics of how transactions fail without
> a stronger justification than wanting to make it faster!
>
OK. I will send the v2, thanks!
Liang
> >> Granted, this doesn't matter currently because mirror isn't a
> >> transactional job, but that makes the decision all the more curious.
> >>
> >>> }
> >>> }
> >
> > Thanks for your comments.
> >
> > Liang
> >