[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [PATCH v4] blockjob: Fix crash with IOthread when block commit after
From: |
仇大玉 |
Subject: |
RE: [PATCH v4] blockjob: Fix crash with IOthread when block commit after snapshot |
Date: |
Mon, 1 Feb 2021 02:40:45 +0000 |
Any comments?
It's really a bug and can cause the qemu to segmentfault.
Thanks,
Michael
-----Original Message-----
From: 仇大玉
Sent: 2021年1月28日 13:16
To: qemu-block@nongnu.org; qemu-devel@nongnu.org
Cc: kwolf@redhat.com; mreitz@redhat.com; jsnow@redhat.com; 08005325@163.com
Subject: RE: [PATCH v4] blockjob: Fix crash with IOthread when block commit
after snapshot
Any comments?
-----Original Message-----
From: 08005325@163.com <08005325@163.com>
Sent: 2021年1月28日 9:31
To: kwolf@redhat.com; mreitz@redhat.com; jsnow@redhat.com
Cc: qemu-block@nongnu.org; qemu-devel@nongnu.org; 仇大玉 <qiudayu@huayun.com>
Subject: [PATCH v4] blockjob: Fix crash with IOthread when block commit after
snapshot
From: Michael Qiu <qiudayu@huayun.com>
v4: rebase to latest code
v3: reformat the commit log, remove duplicate content
v2: modify the coredump backtrace within commit log with the newest
qemu with master branch
Currently, if guest has workloads, IO thread will acquire aio_context lock
before do io_submit, it leads to segmentfault when do block commit after
snapshot. Just like below:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f7c7d91f700 (LWP 99907)] 0x00005576d0f65aab in
bdrv_mirror_top_pwritev at ../block/mirror.c:1437
1437 ../block/mirror.c: No such file or directory.
(gdb) p s->job
$17 = (MirrorBlockJob *) 0x0
(gdb) p s->stop
$18 = false
(gdb) bt
Switch to qemu main thread:
/lib/../lib64/libpthread.so.0
/lib/../lib64/libpthread.so.0
../util/qemu-thread-posix.c:79
qapi/qapi-commands-block-core.c:346
../qapi/qmp-dispatch.c:110
/lib/../lib64/libglib-2.0.so.0
In IO thread when do bdrv_mirror_top_pwritev, the job is NULL, and stop field
is false, this means the MirrorBDSOpaque "s" object has not been initialized
yet, and this object is initialized by block_job_create(), but the initialize
process is stuck in acquiring the lock.
The rootcause is that qemu do release/acquire when hold the lock, at the same
time, IO thread get the lock after release stage, and the crash occured.
Actually, in this situation, job->job.aio_context will not equal to
qemu_get_aio_context(), and will be the same as bs->aio_context, thus, no need
to release the lock, becasue bdrv_root_attach_child() will not change the
context.
This patch fix this issue.
Signed-off-by: Michael Qiu <qiudayu@huayun.com>
---
blockjob.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/blockjob.c b/blockjob.c
index 98ac8af982..51a09f3b60 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -214,13 +214,15 @@ int block_job_add_bdrv(BlockJob *job, const char *name,
BlockDriverState *bs,
BdrvChild *c;
bdrv_ref(bs);
- if (job->job.aio_context != qemu_get_aio_context()) {
+ if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+ job->job.aio_context != qemu_get_aio_context()) {
aio_context_release(job->job.aio_context);
}
c = bdrv_root_attach_child(bs, name, &child_job, 0,
job->job.aio_context, perm, shared_perm, job,
errp);
- if (job->job.aio_context != qemu_get_aio_context()) {
+ if (bdrv_get_aio_context(bs) != job->job.aio_context &&
+ job->job.aio_context != qemu_get_aio_context()) {
aio_context_acquire(job->job.aio_context);
}
if (c == NULL) {
--
2.22.0