[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] hw/nvme: Fix deallocate when metadata is present
From: |
Klaus Jensen |
Subject: |
Re: [PATCH] hw/nvme: Fix deallocate when metadata is present |
Date: |
Fri, 3 Jun 2022 22:03:18 +0200 |
On Jun 3 13:14, Jonathan Derrick wrote:
> When metadata is present in the namespace and deallocates are issued, the
> first
> deallocate could fail to zero the block range, resulting in another
> deallocation to be issued. Normally after the deallocation completes and the
> range is checked for zeroes, a deallocation is then issued for the metadata
> space. In the failure case where the range is not zeroed, deallocation is
> reissued for the block range (and followed with metadata deallocation), but
> the
> original range deallocation task will also issue a metadata deallocation:
>
> nvme_dsm_cb()
> *range deallocation*
> nvme_dsm_md_cb()
> if (nvme_block_status_all()) (range deallocation failure)
> nvme_dsm_cb()
> *range deallocation*
> nvme_dsm_md_cb()
> if (nvme_block_status_all()) (no failure)
> *metadata deallocation*
> *metadata deallocation*
>
> This sequence results in reentry of nvme_dsm_cb() before the metadata has been
> deallocated. During reentry, the metadata is deallocated in the reentrant
> task.
> nvme_dsm_bh() is called which deletes and sets iocb->bh to NULL. When reentry
> returns from nvme_dsm_cb(), metadata deallocation takes place again, and
> results in a null pointer dereference on the iocb->bh:
>
> BH deletion:
> #0 nvme_dsm_bh (opaque=0x55ef893e2f10) at ../hw/nvme/ctrl.c:2316
> #1 0x000055ef868eb333 in aio_bh_call (bh=0x55ef8a441b30) at
> ../util/async.c:141
> #2 0x000055ef868eb441 in aio_bh_poll (ctx=0x55ef892c6e40) at
> ../util/async.c:169
> #3 0x000055ef868d2789 in aio_dispatch (ctx=0x55ef892c6e40) at
> ../util/aio-posix.c:415
> #4 0x000055ef868eb896 in aio_ctx_dispatch (source=0x55ef892c6e40,
> callback=0x0, user_data=0x0) at ../util/async.c:311
> #5 0x00007f5bfe4ab17d in g_main_context_dispatch () at
> /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #6 0x000055ef868fcd98 in glib_pollfds_poll () at ../util/main-loop.c:232
> #7 0x000055ef868fce16 in os_host_main_loop_wait (timeout=0) at
> ../util/main-loop.c:255
> #8 0x000055ef868fcf27 in main_loop_wait (nonblocking=0) at
> ../util/main-loop.c:531
> #9 0x000055ef864a2442 in qemu_main_loop () at ../softmmu/runstate.c:726
> #10 0x000055ef860f957a in main (argc=29, argv=0x7ffdc9705508,
> envp=0x7ffdc97055f8) at ../softmmu/main.c:50
>
> nvme_dsm_cb() called for metadata after nvme_dsm_bh() completes from
> reentrant task:
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x000055ef868eb07c in aio_bh_enqueue (bh=0x0, new_flags=2) at
> ../util/async.c:70
> 70 AioContext *ctx = bh->ctx;
> (gdb) backtrace
> #0 0x000055ef868eb07c in aio_bh_enqueue (bh=0x0, new_flags=2) at
> ../util/async.c:70
> #1 0x000055ef868eb4cf in qemu_bh_schedule (bh=0x0) at ../util/async.c:186
> #2 0x000055ef862db21e in nvme_dsm_cb (opaque=0x55ef897b41a0, ret=0) at
> ../hw/nvme/ctrl.c:2423
> #3 0x000055ef8665a662 in blk_aio_complete (acb=0x55ef89c6d8c0) at
> ../block/block-backend.c:1419
> #4 0x000055ef8665a940 in blk_aio_write_entry (opaque=0x55ef89c6d8c0) at
> ../block/block-backend.c:1486
> #5 0x000055ef868edcf2 in coroutine_trampoline (i0=-536848976, i1=32602) at
> ../util/coroutine-ucontext.c:173
> #6 0x00007f5bfe0bc510 in __start_context () at
> ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
> #7 0x00007f5bf757bb40 in ()
> #8 0x0000000000000000 in ()
>
> The fix is to return when an nvme_dsm_cb() is reentered due to failure to
> deallocate the block range, so that metadata deallocate is then only issued in
> the reentrant task and prevent doing it again when the reentrant task returns
> to the original task.
>
> Reproduction steps (with emulated namespace):
> nvme format --lbaf=1 -f /dev/nvme0n1
> mkfs.ext4 /dev/nvme0n1
> mkfs.ext4 -F /dev/nvme0n1
>
> Signed-off-by: Francis Pravin AntonyX Michael Raj
> <francis.michael@solidigm.com>
> Signed-off-by: Michael Kropaczek <michael.kropaczek@solidigm.com>
> Signed-off-by: Jonathan Derrick <jonathan.derrick@solidigm.com>
> ---
> hw/nvme/ctrl.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 03760ddeae..74540a03d5 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -2372,6 +2372,7 @@ static void nvme_dsm_md_cb(void *opaque, int ret)
> }
>
> nvme_dsm_cb(iocb, 0);
> + return;
> }
>
> iocb->aiocb = blk_aio_pwrite_zeroes(ns->blkconf.blk, nvme_moff(ns, slba),
> --
> 2.25.1
>
Yeah, thanks for the elaborate analysis!
A fix for this has been lingering in nvme-next. I sincerely apologize
for not sending the pull request to master earlier, which might have
saved you the trouble of tracking this down.
I just sent the pull request to Peter.
signature.asc
Description: PGP signature