[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming t
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer |
Date: |
Tue, 17 May 2016 16:47:57 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Am 17.05.2016 um 16:26 hat Alberto Garcia geschrieben:
> On Thu 12 May 2016 05:28:34 PM CEST, Kevin Wolf wrote:
> > Am 12.05.2016 um 17:13 hat Alberto Garcia geschrieben:
> >> On Thu 12 May 2016 05:04:51 PM CEST, Kevin Wolf wrote:
> >> > Am 12.05.2016 um 15:47 hat Alberto Garcia geschrieben:
> >> >> On Tue 03 May 2016 03:48:47 PM CEST, Kevin Wolf wrote:
> >> >> > Am 03.05.2016 um 15:33 hat Alberto Garcia geschrieben:
> >> >> >> On Tue 03 May 2016 03:23:24 PM CEST, Kevin Wolf wrote:
> >> >> >> >> c) we fix bdrv_reopen() so we can actually run both jobs at the
> >> >> >> >> same
> >> >> >> >> time. I'm wondering if pausing all block jobs between
> >> >> >> >> bdrv_reopen_prepare() and bdrv_reopen_commit() would do the
> >> >> >> >> trick. Opinions?
> >> >> >> >
> >> >> >> > I would have to read up the details of the problem again, but I
> >> >> >> > think
> >> >> >> > with bdrv_drained_begin/end() we actually have the right tool now
> >> >> >> > to fix
> >> >> >> > it properly. We may need to pull up the drain (bdrv_drain_all()
> >> >> >> > today)
> >> >> >> > from bdrv_reopen_multiple() to its caller and just assert it in the
> >> >> >> > function itself, but there shouldn't be much more to it than that.
> >> >> >>
> >> >> >> I think that's not enough, see point 2) here:
> >> >> >>
> >> >> >> https://lists.gnu.org/archive/html/qemu-block/2015-12/msg00180.html
> >> >> >>
> >> >> >> "I've been taking a look at the bdrv_drained_begin/end() API, but
> >> >> >> as I
> >> >> >> understand it it prevents requests from a different AioContext.
> >> >> >> Since all BDS in the same chain share the same context it does not
> >> >> >> really help here."
> >> >> >
> >> >> > Yes, that's the part I meant with pulling up the calls.
> >> >> >
> >> >> > If I understand correctly, the problem is that first
> >> >> > bdrv_reopen_queue()
> >> >> > queues a few BDSes for reopen, then bdrv_drain_all() completes all
> >> >> > running requests and can indirectly trigger a graph modification, and
> >> >> > then bdrv_reopen_multiple() uses the queue which doesn't match reality
> >> >> > any more.
> >> >> >
> >> >> > The solution to that should be simply changing the order of things:
> >> >> >
> >> >> > 1. bdrv_drained_begin()
> >> >> > 2. bdrv_reopen_queue()
> >> >> > 3. bdrv_reopen_multiple()
> >> >> > * Instead of bdrv_drain_all(), assert that no requests are pending
> >> >> > * We don't run requests, so we can't complete a block job and
> >> >> > manipulate the graph any more
> >> >> > 4. then bdrv_drained_end()
> >> >>
> >> >> This doesn't work. Here's what happens:
> >> >>
> >> >> 1) Block job (a) starts (block-stream).
> >> >>
> >> >> 2) Block job (b) starts (block-stream, or block-commit).
> >> >>
> >> >> 3) job (b) calls bdrv_reopen() and does the drain call.
> >> >>
> >> >> 4) job (b) creates reopen_queue and calls bdrv_reopen_multiple().
> >> >> There are no pending requests at this point, but job (a) is sleeping.
> >> >>
> >> >> 5) bdrv_reopen_multiple() iterates over reopen_queue and calls
> >> >> bdrv_reopen_prepare() -> bdrv_flush() -> bdrv_co_flush() ->
> >> >> qemu_coroutine_yield().
> >> >
> >> > I think between here and the next step is what I don't understand.
> >> >
> >> > bdrv_reopen_multiple() is not called in coroutine context, right? All
> >> > block jobs use block_job_defer_to_main_loop() before they call
> >> > bdrv_reopen(), as far as I can see. So bdrv_flush() shouldn't take the
> >> > shortcut, but use a nested event loop.
> >>
> >> When bdrv_flush() is not called in coroutine context it does
> >> qemu_coroutine_create() + qemu_coroutine_enter().
> >
> > Right, but if the coroutine yields, we jump back to the caller, which
> > looks like this:
> >
> > co = qemu_coroutine_create(bdrv_flush_co_entry);
> > qemu_coroutine_enter(co, &rwco);
> > while (rwco.ret == NOT_DONE) {
> > aio_poll(aio_context, true);
> > }
> >
> > So this loops until the flush has completed. The only way I can see how
> > something else (job (a)) can run is if aio_poll() calls it.
>
> If I'm not wrong that can happen if job (a) is sleeping.
>
> If job (a) is launched with a rate limit, then the bdrv_drain() call
> will not drain it completely but return when the block job hits the I/O
> limit -> block_job_sleep_ns() -> co_aio_sleep_ns().
So timers. Currently, bdrv_drained_begin/end disables the FD handlers,
but not timers. Not sure if that's easily fixable in a generic way,
though, so maybe we need to disable timers one by one.
Pausing a block job does not actually cause the timer to be cancelled.
Maybe we need to change block_job_sleep_ns() so that the function
returns only when the job isn't paused any more. Maybe something like:
job->busy = false;
if (!block_job_is_paused(job)) {
co_aio_sleep_ns(bdrv_get_aio_context(job->bs), type, ns);
/* The job could be paused now */
}
if (block_job_is_paused(job)) {
qemu_coroutine_yield();
}
job->busy = true;
Then it should be safe to assume that if block jobs are paused (even if
they were sleeping), they won't issue new requests any more until they
are resumed.
> >> > I don't fully understand the problem yet, but as a shot in the
> >> > dark, would pausing block jobs in bdrv_drained_begin() help?
> >>
> >> Yeah, my impression is that pausing all jobs during bdrv_reopen()
> >> should be enough.
> >
> > If you base your patches on top of my queue (which I think you already
> > do for the greatest part), the nicest way to implement this would
> > probably be that BlockBackends give their users a callback for
> > .drained_begin/end and the jobs implement that as pausing themselves.
> >
> > We could, of course, directly pause block jobs in
> > bdrv_drained_begin(), but that would feel a bit hackish. So maybe do
> > that for a quick attempt whether it helps, and if it does, we can
> > write the real thing then.
>
> I'll try that.
>
> On a related note, bdrv_drain_all() already pauses all block jobs. The
> problem is that when it ends it resumes them, so it can happen that
> there's again pending I/O requests before bdrv_drain_all() returns.
>
> That sounds like a problem to me, or am I overlooking anything?
Essentially every bdrv_drain(_all) call is a problem, it should always
be a bdrv_drained_begin/end section. The only exception is where you
want to drain only requests from a single source that you know doesn't
send new requests (e.g. draining guest requests in vm_stop).
Kevin
- Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer, Alberto Garcia, 2016/05/03
- Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer, Kevin Wolf, 2016/05/03
- Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer, Alberto Garcia, 2016/05/03
- Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer, Kevin Wolf, 2016/05/03
- Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer, Alberto Garcia, 2016/05/03
- Message not available
- Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer, Kevin Wolf, 2016/05/12
- Message not available
- Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer, Kevin Wolf, 2016/05/12
- Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer, Alberto Garcia, 2016/05/17
- Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer,
Kevin Wolf <=
- Re: [Qemu-devel] [PATCH v9 07/11] block: Add QMP support for streaming to an intermediate layer, Alberto Garcia, 2016/05/17