[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Deadlock with ide_issue_trim and draining

From: Hanna Czenczek
Subject: Re: Deadlock with ide_issue_trim and draining
Date: Wed, 8 Mar 2023 19:03:58 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1

On 07.03.23 14:44, Hanna Czenczek wrote:
On 07.03.23 13:22, Fiona Ebner wrote:
I am suspecting that commit 7e5cdb345f ("ide: Increment BB in-flight
counter for TRIM BH") introduced an issue in combination with draining.

 From a debug session on a costumer's machine I gathered the following
* The QEMU process hangs in aio_poll called during draining and doesn't
* The in_flight counter for the BlockDriverState is 0 and for the
BlockBackend it is 1.
* There is a blk_aio_pdiscard_entry request in the BlockBackend's
* The drive is attached via ahci.

I suspect that something like the following happened:

1. ide_issue_trim is called, and increments the in_flight counter.
2. ide_issue_trim_cb calls blk_aio_pdiscard.
3. somebody else starts draining.
4. ide_issue_trim_cb is called as the completion callback for
5. ide_issue_trim_cb issues yet another blk_aio_pdiscard request.
6. The request is added to the wait queue via blk_wait_while_drained,
because draining has been started.
7. Nobody ever decrements the in_flight counter and draining can't finish.

Sounds about right.

The issue occurs very rarely and is difficult to reproduce, but with the
help of GDB, I'm able to do it rather reliably:
1. Use GDB to break on blk_aio_pdiscard.
2. Run mkfs.ext4 on a huge disk in the guest.
3. Issue a drive-backup QMP command after landing on the breakpoint.
4. Continue a few times in GDB.
5. After that I can observe the same situation as described above.

I'd be happy about suggestions for how to fix it. Unfortunately, I don't
see a clear-cut way at the moment. The only idea I have right now is to
change the code to issue all discard requests at the same time, but I
fear there might pitfalls with that?

The point of 7e5cdb345f was that we need any in-flight count to accompany a set s->bus->dma->aiocb.  While blk_aio_pdiscard() is happening, we don’t necessarily need another count.  But we do need it while there is no blk_aio_pdiscard().

ide_issue_trim_cb() returns in two cases (and, recursively through its callers, leaves s->bus->dma->aiocb set):
1. After calling blk_aio_pdiscard(), which will keep an in-flight count,
2. After calling replay_bh_schedule_event() (i.e. qemu_bh_schedule()), which does not keep an in-flight count.

Perhaps we just need to move the blk_inc_in_flight() above the replay_bh_schedule_event() call?

While writing the commit message for this, I noticed it isn’t quite right: ide_cancel_dma_sync() drains s->blk only once, so once the in-flight counter goes to 0, s->blk is considered drained and ide_cancel_dma_sync() will go on to assert that s->bus->dma->aiocb is now NULL.  However, if we do have a blk_aio_pdiscard() in flight, the drain will wait only for that one to complete, not for the whole trim operation to complete, i.e. the next discard or ide_trim_bh_cb() will be scheduled, but neither will necessarily be run before blk_drain() returns.

I’ve attached a reproducer that issues two trim requests.  Before 7e5cdb345f, it makes qemu crash because the assertion fails (one or two of the blk_aio_pdiscard()s is drained, but the trim isn’t settled yet).  After 7e5cdb345f, qemu hangs because of what you describe (the second blk_aio_pdiscard() is enqueued, so the drain can’t make progress, resulting in a deadlock).  With my proposed fix, qemu crashes again.

(Reproducer is run like this:
$ qemu-system-x86_64 -drive if=ide,file=/tmp/test.bin,format=raw

What comes to my mind is either what you’ve proposed initially (issuing all discards simultaneously), or to still use my proposed fix, but also have ide_cancel_dma_sync() run blk_drain() in a loop until s->bus->dma->aiocb becomes NULL.  (Kind of like my original patch (https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html), only that we can still use blk_drain() instead of aio_poll() because we increment the in-flight counter while the completion BH is scheduled.)


Attachment: test.bin
Description: Binary data

Attachment: test.asm
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]