qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Properly quitting qemu immediately after failing migration


From: Vladimir Sementsov-Ogievskiy
Subject: Re: Properly quitting qemu immediately after failing migration
Date: Mon, 29 Jun 2020 17:18:10 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0

29.06.2020 16:48, Max Reitz wrote:
Hi,

In an iotest, I’m trying to quit qemu immediately after a migration has
failed.  Unfortunately, that doesn’t seem to be possible in a clean way:
migrate_fd_cleanup() runs only at some point after the migration state
is already “failed”, so if I just wait for that “failed” state and
immediately quit, some cleanup functions may not have been run yet.

This is a problem with dirty bitmap migration at least, because it
increases the refcount on all block devices that are to be migrated, so
if we don’t call the cleanup function before quitting, the refcount will
stay elevated and bdrv_close_all() will hit an assertion because those
block devices are still around after blk_remove_all_bs() and
blockdev_close_all_bdrv_states().

In practice this particular issue might not be that big of a problem,
because it just means qemu aborts when the user intended to let it quit
anyway.  But on one hand I could imagine that there are other clean-up
paths that should definitely run before qemu quits (although I don’t
know), and on the other, it’s a problem for my test.

I tried working around the problem for my test by waiting on “Unable to
write” appearing on stderr, because that indicates that
migrate_fd_cleanup()’s error_report_err() has been reached.  But on one
hand, that isn’t really nice, and on the other, it doesn’t even work
when the failure is on the source side (because then there is no
s->error for migrate_fd_cleanup() to report).

In all, I’m asking:
(1) Is there a nice solution for me now to delay quitting qemu until the
failed migration has been fully resolved, including the clean-up?

(2) Isn’t it a problem if qemu crashes when you issue “quit” via QMP at
the wrong time?  Like, maybe lingering subprocesses when using “exec”?



I'll look more closely tomorrow, but as a short answer: try my series
"[PATCH v2 00/22] Fix error handling during bitmap postcopy" it
handles different problems around migration failures & qemu shutdown,
probably it will help.


--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]