qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

aio-poll dead-lock


From: Vladimir Sementsov-Ogievskiy
Subject: aio-poll dead-lock
Date: Thu, 17 Dec 2020 15:16:27 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1

I don't think that it's something new, but just to keep it in mind:

blk_prw do polling with increased in-flight counter.

So, if some bh wants to do drain, we definitely dead-lock in a nested aio_poll 
loop.

Here is a backtrace (comes from Virtuozzo branch, so I don't have reproducer 
for master, but probably I'll return to this later):

#0  0x00007f895d751b56 in ppoll () at /lib64/libc.so.6
#1  0x0000558f664e371c in qemu_poll_ns (fds=0x558f6778e630, nfds=1, timeout=-1) 
at util/qemu-timer.c:335
#2  0x0000558f664bf5a5 in fdmon_poll_wait (ctx=0x558f67769480, 
ready_list=0x7ffe3d6be730, timeout=-1) at util/fdmon-poll.c:79
#3  0x0000558f664beed3 in aio_poll (ctx=0x558f67769480, blocking=true) at 
util/aio-posix.c:600
#4  0x0000558f663e8f82 in bdrv_do_drained_begin (bs=0x558f6855bcc0, 
recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at 
block/io.c:435
#5  0x0000558f663e9067 in bdrv_drained_begin (bs=0x558f6855bcc0) at 
block/io.c:441
#6  0x0000558f66411df3 in bdrv_backup_top_drop (bs=0x558f6855bcc0) at 
block/backup-top.c:296
#7  0x0000558f6640a0de in backup_clean (job=0x558f6814f130) at 
block/backup.c:109
#8  0x0000558f66372019 in job_clean (job=0x558f6814f130) at job.c:678
#9  0x0000558f66372094 in job_finalize_single (job=0x558f6814f130) at job.c:694
#10 0x0000558f66370c41 in job_txn_apply (job=0x558f6814f130, fn=0x558f6637201c 
<job_finalize_single>) at job.c:158
#11 0x0000558f6637243b in job_do_finalize (job=0x558f6814f130) at job.c:803
#12 0x0000558f663725d8 in job_completed_txn_success (job=0x558f6814f130) at 
job.c:853
#13 0x0000558f66372678 in job_completed (job=0x558f6814f130) at job.c:866
#14 0x0000558f663726cb in job_exit (opaque=0x558f6814f130) at job.c:886
#15 0x0000558f664d48eb in aio_bh_call (bh=0x558f683ee370) at util/async.c:136
#16 0x0000558f664d49f5 in aio_bh_poll (ctx=0x558f67769480) at util/async.c:164
#17 0x0000558f664bf0c6 in aio_poll (ctx=0x558f67769480, blocking=true) at 
util/aio-posix.c:650
#18 0x0000558f663d357a in blk_prw (blk=0x558f677804d0, offset=0, buf=0x558f67f34000 '\253' 
<repeats 200 times>..., bytes=65536, co_entry=0x558f663d339f <blk_write_entry>, 
flags=0) at block/block-backend.c:1336
#19 0x0000558f663d3be3 in blk_pwrite (blk=0x558f677804d0, offset=0, 
buf=0x558f67f34000, count=65536, flags=0) at block/block-backend.c:1502
#20 0x0000558f66374355 in do_pwrite (blk=0x558f677804d0, buf=0x558f67f34000 '\253' 
<repeats 200 times>..., offset=0, bytes=65536, flags=0, total=0x7ffe3d6bec38) 
at qemu-io-cmds.c:551
#21 0x0000558f6637566a in write_f (blk=0x558f677804d0, argc=4, 
argv=0x558f685600d0) at qemu-io-cmds.c:1192
#22 0x0000558f66373244 in command (blk=0x558f677804d0, ct=0x558f67544a58, 
argc=4, argv=0x558f685600d0) at qemu-io-cmds.c:118
#23 0x0000558f66377d80 in qemuio_command (blk=0x558f677804d0, cmd=0x558f67ff0ee0 
"write -P0xab 0 64k") at qemu-io-cmds.c:2465
#24 0x0000558f6608badd in hmp_qemu_io (mon=0x7ffe3d6bee50, 
qdict=0x558f68125010) at block/monitor/block-hmp-cmds.c:628
#25 0x0000558f662c76b2 in handle_hmp_command (mon=0x7ffe3d6bee50, cmdline=0x7f8948007688 
"drive0 \"write -P0xab 0 64k\"") at monitor/hmp.c:1082
#26 0x0000558f65fb12c6 in qmp_human_monitor_command (command_line=0x7f8948007680 "qemu-io 
drive0 \"write -P0xab 0 64k\"", has_cpu_index=false, cpu_index=0, 
errp=0x7ffe3d6bef58)
    at /work/src/qemu/vz-8.0/monitor/misc.c:141
#27 0x0000558f662facb1 in qmp_marshal_human_monitor_command 
(args=0x7f8948007930, ret=0x7ffe3d6befe0, errp=0x7ffe3d6befd8) at 
qapi/qapi-commands-misc.c:653
#28 0x0000558f66468ff9 in qmp_dispatch (cmds=0x558f66a9bd10 <qmp_commands>, 
request=0x7f8948005600, allow_oob=false) at qapi/qmp-dispatch.c:155
#29 0x0000558f662c416c in monitor_qmp_dispatch (mon=0x558f67790ab0, 
req=0x7f8948005600) at monitor/qmp.c:145
#30 0x0000558f662c451b in monitor_qmp_bh_dispatcher (data=0x0) at 
monitor/qmp.c:234
#31 0x0000558f664d48eb in aio_bh_call (bh=0x558f67594bb0) at util/async.c:136
#32 0x0000558f664d49f5 in aio_bh_poll (ctx=0x558f675945f0) at util/async.c:164
#33 0x0000558f664be7ca in aio_dispatch (ctx=0x558f675945f0) at 
util/aio-posix.c:380
#34 0x0000558f664d4e26 in aio_ctx_dispatch (source=0x558f675945f0, 
callback=0x0, user_data=0x0) at util/async.c:306
#35 0x00007f895f6bf570 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#36 0x0000558f664dcd13 in glib_pollfds_poll () at util/main-loop.c:217
#37 0x0000558f664dcd8d in os_host_main_loop_wait (timeout=985763000) at 
util/main-loop.c:240
#38 0x0000558f664dce92 in main_loop_wait (nonblocking=0) at util/main-loop.c:516
#39 0x0000558f65fcfe66 in qemu_main_loop () at 
/work/src/qemu/vz-8.0/softmmu/vl.c:1676
#40 0x0000558f664625e4 in main (argc=20, argv=0x7ffe3d6bf468, 
envp=0x7ffe3d6bf510) at /work/src/qemu/vz-8.0/softmmu/main.c:49

As far as I know, the only way to figth with this thing is moving things to 
coroutine. So, I think, moving backup_clean to coroutine is a necessary thing.

Any thoughts?

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]