qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all


From: Alberto Garcia
Subject: Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs
Date: Thu, 30 Nov 2017 15:35:11 +0100
User-agent: Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (i586-pc-linux-gnu)

On Thu 30 Nov 2017 01:27:32 PM CET, Kevin Wolf wrote:

>> Destroying a paused block job during bdrv_reopen_multiple() has two
>> consequences:
>> 
>>    1) The references to the nodes involved in the job are released,
>>       possibly destroying some of them. If those nodes were in the
>>       reopen queue this would trigger the problem originally described
>>       in commit 40840e419be, crashing QEMU.
>
> This specific problem could be avoided by making the BDS reference in
> the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child()
> and bdrv_unref() only at the end of bdrv_reopen_multiple().

That is correct.

>>    2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
>>       not be doing all necessary bdrv_parent_drained_end() calls.
>
> If I understand correctly, you don't have a reproducer here.

That's unfortunately not correct.

You can use the very test case that I mentioned in the commit message:

   https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html

With that one, QEMU master crashes easily because of problem (1). If I
hold strong references in the reopen queue as you mentioned, the test
case hangs because of problem (2).

> It's certainly not a full solution because keeping a reference to a
> block job does not prevent it from completing, but only from being
> freed. Most block jobs do graph modifications, including dropping the
> references to nodes, already when they complete, not only when they
> are freed.

Yes but the block job itself holds additional references (thanks to
block_job_add_bdrv()).

> Now, while the above might have sounded like we have an easy solution
> in bdrv_reopen(), I see another problem that looks quite tough:
>
> 3) bdrv_reopen_queue() is a recursive operation that involves all
>    children that inherited options. If the graph changes between the
>    bdrv_reopen_queue() call and the end of bdrv_reopen_multiple(), the
>    calculated options (and today possibly permissions) don't actually
>    match the graph any more.
>
>    The requirement we really have is that between bdrv_reopen_queue()
>    and bdrv_reopen_multiple() no graph changes are made. Calling
>    bdrv_drain_all_begin() in bdrv_reopen_multiple() is too late.

Yes I agree that that should be the requirement.

Berto



reply via email to

[Prev in Thread] Current Thread [Next in Thread]