Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)

From:	John Snow
Subject:	Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)
Date:	Thu, 16 Nov 2017 16:56:58 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0


On 11/16/2017 10:54 AM, Alberto Garcia wrote:
> On Wed 15 Nov 2017 05:31:20 PM CET, Anton Nefedov wrote:
>>> I have the impression that one major source of headaches is the fact
>>> that the reopen queue contains nodes that don't need to be reopened at
>>> all. Ideally this should be detected early on in bdrv_reopen_queue(), so
>>> there's no chance that the queue contains nodes used by a different
>>> block job. If we had that then op blockers should be enough to prevent
>>> these things. Or am I missing something?
>>>
>> After applying Max's patch I tried the similar approach; that is keep
>> BDSes referenced while they are in the reopen queue.  Now I get the
>> stream job hanging. Somehow one blk_root_drained_begin() is not paired
>> with blk_root_drained_end(). So the job stays paused.
> 
> I can see this if I apply Max's patch and keep refs to BDSs in the
> reopen queue:
> 
> #0  block_job_pause (...) at blockjob.c:130
> #1  0x000055c143cb586d in block_job_drained_begin (...) at blockjob.c:227
> #2  0x000055c143d08067 in blk_set_dev_ops (...) at block/block-backend.c:887
> #3  0x000055c143cb69db in block_job_create (...) at blockjob.c:678
> #4  0x000055c143d17c0c in mirror_start_job (...) at block/mirror.c:1177
> 
> There's a ops->drained_begin(opaque) call in blk_set_dev_ops() that
> doesn't seem to be paired. And when you call block_job_start() then it
> yields immediately waiting for the resume (that never arrives).
> 
> John, this change was yours (f4d9cc88ee69a5b04). Any idea?
> 
> Berto
> 

The idea at the time was that if you tell the BlockBackend to drain and
then attach a job to it, once you go to *end* the drained region you'd
have a mismatched begin/end pair.

To allow for some flexibility and to make sure that you *didn't* have a
mismatched begin/end call, what I did was that if you attach dev_ops to
an already drained backend (i.e. we "missed our chance" to issue the
drained_begin) we play catch-up and issue the drained call.

There's no matching call here, because I anticipated whoever initially
bumped that quiesce_counter up to be issuing the drained_end, which will
then be propagated according to the dev_ops structure in place.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-block] segfault in parallel blockjobs (iotest 30), (continued)
- Re: [Qemu-block] segfault in parallel blockjobs (iotest 30), Alberto Garcia, 2017/11/22

Prev by Date: Re: [Qemu-block] [PATCH v2] qapi: block-core: Clarify events emitted by 'block-job-cancel'
Next by Date: Re: [Qemu-block] [Qemu-devel] [PATCH v8 10/14] migration: add postcopy migration of dirty bitmaps
Previous by thread: Re: [Qemu-block] [Qemu-devel] segfault in parallel blockjobs (iotest 30)
Next by thread: Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)
Index(es):
- Date
- Thread