[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qe
From: |
Fiona Ebner |
Subject: |
[RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context |
Date: |
Wed, 5 Jun 2024 14:08:48 +0200 |
The fact that the snapshot_save_job_bh() is scheduled in the main
loop's qemu_aio_context AioContext means that it might get executed
during a vCPU thread's aio_poll(). But saving of the VM state cannot
happen while the guest or devices are active and can lead to assertion
failures. See issue #2111 for two examples. Avoid the problem by
scheduling the snapshot_save_job_bh() in the iohandler AioContext,
which is not polled by vCPU threads.
Solves Issue #2111.
This change also solves the following issue:
Since commit effd60c878 ("monitor: only run coroutine commands in
qemu_aio_context"), the 'snapshot-save' QMP call would not respond
right after starting the job anymore, but only after the job finished,
which can take a long time. The reason is, because after commit
effd60c878, do_qmp_dispatch_bh() runs in the iohandler AioContext.
When do_qmp_dispatch_bh() wakes the qmp_dispatch() coroutine, the
coroutine cannot be entered immediately anymore, but needs to be
scheduled to the main loop's qemu_aio_context AioContext. But
snapshot_save_job_bh() was scheduled first to the same AioContext and
thus gets executed first.
Buglink: https://gitlab.com/qemu-project/qemu/-/issues/2111
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
While initial smoke testing seems fine, I'm not familiar enough with
this to rule out any pitfalls with the approach. Any reason why
scheduling to the iohandler AioContext could be wrong here?
Should the same be done for the snapshot_load_job_bh and
snapshot_delete_job_bh to keep it consistent?
migration/savevm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index c621f2359b..0086b76ab0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3459,7 +3459,7 @@ static int coroutine_fn snapshot_save_job_run(Job *job,
Error **errp)
SnapshotJob *s = container_of(job, SnapshotJob, common);
s->errp = errp;
s->co = qemu_coroutine_self();
- aio_bh_schedule_oneshot(qemu_get_aio_context(),
+ aio_bh_schedule_oneshot(iohandler_get_aio_context(),
snapshot_save_job_bh, job);
qemu_coroutine_yield();
return s->ret ? 0 : -1;
--
2.39.2
- [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context,
Fiona Ebner <=
- Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context, Fabiano Rosas, 2024/06/05
- Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context, Stefan Hajnoczi, 2024/06/06
- Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context, Fiona Ebner, 2024/06/11
- Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context, Stefan Hajnoczi, 2024/06/11
- Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context, Fiona Ebner, 2024/06/12
- Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context, Stefan Hajnoczi, 2024/06/12
- Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context, Fiona Ebner, 2024/06/14
- Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context, Stefan Hajnoczi, 2024/06/18