qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: iotest 030 SIGSEGV


From: Hanna Reitz
Subject: Re: iotest 030 SIGSEGV
Date: Thu, 14 Oct 2021 15:20:41 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0

On 13.10.21 23:50, John Snow wrote:
In trying to replace the QMP library backend, I have now twice stumbled upon a SIGSEGV in iotest 030 in the last three weeks or so.

I didn't have debug symbols on at the time, so I've got only this stack trace:

(gdb) thread apply all bt

Thread 8 (Thread 0x7f0a6b8c4640 (LWP 1873554)):
#0  0x00007f0a748a53ff in poll () at /lib64/libc.so.6
#1  0x00007f0a759bfa36 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
#2  0x00007f0a7596d163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x0000557dac31d121 in iothread_run (opaque=opaque@entry=0x557dadd98800) at ../../iothread.c:73 #4  0x0000557dac4d7f89 in qemu_thread_start (args=0x7f0a6b8c3650) at ../../util/qemu-thread-posix.c:557
#5  0x00007f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x00007f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 7 (Thread 0x7f0a6b000640 (LWP 1873555)):
#0  0x00007f0a747ed7d2 in sigtimedwait () at /lib64/libc.so.6
#1  0x00007f0a74b72cdc in sigwait () at /lib64/libpthread.so.0
#2  0x0000557dac2e403b in dummy_cpu_thread_fn (arg=arg@entry=0x557dae041c10) at ../../accel/dummy-cpus.c:46 #3  0x0000557dac4d7f89 in qemu_thread_start (args=0x7f0a6afff650) at ../../util/qemu-thread-posix.c:557
#4  0x00007f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 6 (Thread 0x7f0a56afa640 (LWP 1873582)):
#0  0x00007f0a74b71308 in do_futex_wait.constprop () at /lib64/libpthread.so.0 #1  0x00007f0a74b71433 in __new_sem_wait_slow.constprop.0 () at /lib64/libpthread.so.0 #2  0x0000557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878, ms=ms@entry=10000) at ../../util/qemu-thread-posix.c:327 #3  0x0000557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:91 #4  0x0000557dac4d7f89 in qemu_thread_start (args=0x7f0a56af9650) at ../../util/qemu-thread-posix.c:557
#5  0x00007f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x00007f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 5 (Thread 0x7f0a57dff640 (LWP 1873580)):
#0  0x00007f0a74b71308 in do_futex_wait.constprop () at /lib64/libpthread.so.0 #1  0x00007f0a74b71433 in __new_sem_wait_slow.constprop.0 () at /lib64/libpthread.so.0 #2  0x0000557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878, ms=ms@entry=10000) at ../../util/qemu-thread-posix.c:327 #3  0x0000557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:91 #4  0x0000557dac4d7f89 in qemu_thread_start (args=0x7f0a57dfe650) at ../../util/qemu-thread-posix.c:557
#5  0x00007f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x00007f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 4 (Thread 0x7f0a572fb640 (LWP 1873581)):
#0  0x00007f0a74b7296f in pread64 () at /lib64/libpthread.so.0
#1  0x0000557dac39f18f in pread64 (__offset=<optimized out>, __nbytes=<optimized out>, __buf=<optimized out>, __fd=<optimized out>) at /usr/include/bits/unistd.h:105 #2  handle_aiocb_rw_linear (aiocb=aiocb@entry=0x7f0a573fc150, buf=0x7f0a6a47e000 '\377' <repeats 200 times>...) at ../../block/file-posix.c:1481 #3  0x0000557dac39f664 in handle_aiocb_rw (opaque=0x7f0a573fc150) at ../../block/file-posix.c:1521 #4  0x0000557dac4f5b54 in worker_thread (opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:104 #5  0x0000557dac4d7f89 in qemu_thread_start (args=0x7f0a572fa650) at ../../util/qemu-thread-posix.c:557
#6  0x00007f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#7  0x00007f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 3 (Thread 0x7f0a714e8640 (LWP 1873552)):
#0  0x00007f0a748aaedd in syscall () at /lib64/libc.so.6
#1  0x0000557dac4d916a in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /home/jsnow/src/qemu/include/qemu/futex.h:29 #2  qemu_event_wait (ev=ev@entry=0x557dace1f1e8 <rcu_call_ready_event>) at ../../util/qemu-thread-posix.c:480 #3  0x0000557dac4e189a in call_rcu_thread (opaque=opaque@entry=0x0) at ../../util/rcu.c:258 #4  0x0000557dac4d7f89 in qemu_thread_start (args=0x7f0a714e7650) at ../../util/qemu-thread-posix.c:557
#5  0x00007f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x00007f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 2 (Thread 0x7f0a70ae5640 (LWP 1873553)):
#0  0x00007f0a74b71308 in do_futex_wait.constprop () at /lib64/libpthread.so.0 #1  0x00007f0a74b71433 in __new_sem_wait_slow.constprop.0 () at /lib64/libpthread.so.0 #2  0x0000557dac4d8f1f in qemu_sem_timedwait (sem=sem@entry=0x557dadd62878, ms=ms@entry=10000) at ../../util/qemu-thread-posix.c:327 #3  0x0000557dac4f5ac4 in worker_thread (opaque=opaque@entry=0x557dadd62800) at ../../util/thread-pool.c:91 #4  0x0000557dac4d7f89 in qemu_thread_start (args=0x7f0a70ae4650) at ../../util/qemu-thread-posix.c:557
#5  0x00007f0a74b683f9 in start_thread () at /lib64/libpthread.so.0
#6  0x00007f0a748b04c3 in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7f0a714ebec0 (LWP 1873551)):
#0  bdrv_inherits_from_recursive (parent=parent@entry=0x557dadfb5050, child=0xafafafafafafafaf, child@entry=0x557dae857010) at ../../block.c:3124 #1  bdrv_set_file_or_backing_noperm (parent_bs=parent_bs@entry=0x557dadfb5050, child_bs=child_bs@entry=0x557dae857010, is_backing=is_backing@entry=true, tran=tran@entry=0x557dae699d80, errp=errp@entry=0x7fff7b105d50) at ../../block.c:3157 #2  0x0000557dac3266b2 in bdrv_set_backing_noperm (errp=0x7fff7b105d50, tran=0x557dae699d80, backing_hd=0x557dae857010, bs=0x557dadfb5050) at ../../block.c:3240 #3  bdrv_set_backing_hd (bs=bs@entry=0x557dadfb5050, backing_hd=backing_hd@entry=0x557dae857010, errp=errp@entry=0x7fff7b105d50) at ../../block.c:3249 #4  0x0000557dac3a89d8 in stream_prepare (job=0x557daecc2f40) at ../../block/stream.c:74 #5  0x0000557dac32fcc6 in job_prepare (job=0x557daecc2f40) at ../../job.c:828 #6  job_txn_apply (fn=<optimized out>, job=0x557daecc2f40) at ../../job.c:158
#7  job_do_finalize (job=0x557daecc2f40) at ../../job.c:845
#8  0x0000557dac3300f2 in job_exit (opaque=0x557daecc2f40) at ../../job.c:932 #9  0x0000557dac4e7c64 in aio_bh_call (bh=0x557dadfc2fc0) at ../../util/async.c:141
#10 aio_bh_poll (ctx=ctx@entry=0x557dadd09a60) at ../../util/async.c:169
#11 0x0000557dac4d525e in aio_dispatch (ctx=0x557dadd09a60) at ../../util/aio-posix.c:381 #12 0x0000557dac4e78ce in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../../util/async.c:311 #13 0x00007f0a7596da9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #14 0x0000557dac4f33d0 in glib_pollfds_poll () at ../../util/main-loop.c:232
#15 os_host_main_loop_wait (timeout=0) at ../../util/main-loop.c:255
#16 main_loop_wait (nonblocking=nonblocking@entry=0) at ../../util/main-loop.c:531 #17 0x0000557dac1ef111 in qemu_main_loop () at ../../softmmu/runstate.c:726 #18 0x0000557dabf4b72e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../softmmu/main.c:50


The child=0xafafafafafafafaf bit looks a little concerning!

From commit 33fe08fcaf3773e2151bb60b4f9c62159a0c6633 it looks like we know this test is a little flaky. Do we have a bug tracker issue for this one?

Well, at least I think it’s known.  At least I’ve known about this for quite some time, I just...  Wasn’t particularly keen on looking into it.

That crash here seems related to the fact that stream_prepare() gets `base` before invoking `bdrv_cor_filter_drop()`.  I think the graph modification in that function can mean that some other stream job is drained and perhaps completes, so that the node under `s->above_base` goes away.  I think there are two ways to fix that, either `bdrv_ref(base)` before `bdrv_cor_filter_drop()`; or getting `base` after `bdrv_cor_filter_drop()`.

I don’t know which is better; I just did the latter for testing, and that crash no longer seems to appear.

However.

Now I get two other failures, first a seemingly harmless one:

+======================================================================
+FAIL: test_stream_parallel (__main__.TestParallelOps)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "tests/qemu-iotests/030", line 256, in test_stream_parallel
+    self.assert_qmp(result, 'return', {})
+  File "tests/qemu-iotests/iotests.py", line 977, in assert_qmp
+    result = self.dictpath(d, path)
+  File "tests/qemu-iotests/iotests.py", line 951, in dictpath
+    self.fail(f'failed path traversal for "{path}" in "{d}"')
+AssertionError: failed path traversal for "return" in "{'error': {'class': 'DeviceNotActive', 'desc': "Block job 'stream-node8' not found"}}"

No idea why this happens, there should be 2 to 4 MB of data per layer, and we limit the job speed to 1 kB/s.

But also a not harmless one in the same test (SIGSEGV), whose backtrace looks like this:

(gdb) bt
#0  bdrv_is_inserted (bs=0x0) at ../block.c:6490
#1  0x0000556174ccb065 in bdrv_is_inserted (bs=<optimized out>) at ../block.c:6500 #2  0x0000556174ccb065 in bdrv_is_inserted (bs=<optimized out>) at ../block.c:6500 #3  0x0000556174ccb065 in bdrv_is_inserted (bs=<optimized out>) at ../block.c:6500 #4  0x0000556174ccb065 in bdrv_is_inserted (bs=<optimized out>) at ../block.c:6500 #5  0x0000556174ce8cea in blk_is_inserted (blk=blk@entry=0x5561783bb030) at ../block/block-backend.c:1912 #6  blk_is_available (blk=blk@entry=0x5561783bb030) at ../block/block-backend.c:1917 #7  0x0000556174ce8feb in blk_check_byte_request (blk=0x5561783bb030, offset=7864320, size=524288) at ../block/block-backend.c:1172 #8  0x0000556174ce90a7 in blk_do_preadv (blk=blk@entry=0x5561783bb030, offset=offset@entry=7864320, bytes=524288, qiov=qiov@entry=0x0, flags=flags@entry=BDRV_REQ_PREFETCH) at ../block/block-backend.c:1220 #9  0x0000556174ce91d7 in blk_co_preadv (blk=blk@entry=0x5561783bb030, offset=offset@entry=7864320, bytes=<optimized out>, qiov=qiov@entry=0x0, flags=flags@entry=BDRV_REQ_PREFETCH)
    at ../block/block-backend.c:1245
#10 0x0000556174d4ea2c in stream_populate (bytes=<optimized out>, offset=7864320, blk=0x5561783bb030) at ../block/stream.c:50 #11 stream_run (job=0x5561783bbc00, errp=<optimized out>) at ../block/stream.c:162

So we have a BdrvChild somewhere with a NULL .bs, which isn’t allowed (seems to be bs->backing).  I’m looking into this, but I have no leads yet.

Hanna




reply via email to

[Prev in Thread] Current Thread [Next in Thread]