|
From: | Wei Wang |
Subject: | Re: [Qemu-devel] [PATCH v2 3/3] virtio-balloon: add a timer to limit the free page report waiting time |
Date: | Mon, 26 Feb 2018 12:35:31 +0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 |
On 02/09/2018 08:15 PM, Dr. David Alan Gilbert wrote:
* Wei Wang (address@hidden) wrote:This patch adds a timer to limit the time that host waits for the free page hints reported by the guest. Users can specify the time in ms via "free-page-wait-time" command line option. If a user doesn't specify a time, host waits till the guest finishes reporting all the free page hints. The policy (wait for all the free page hints to be reported or use a time limit) is determined by the orchestration layer.That's kind of a get-out; but there's at least two problems: a) With a timeout of 0 (the default) we might hang forever waiting for the guest; broken guests are just too common, we can't do that. b) Even if we were going to do that, you'd have to make sure that migrate_cancel provided a way out. c) How does that work during a savevm snapshot or when the guest is stopped? d) OK, the timer gives us some safety (except c); but how does the orchestration layer ever come up with a 'safe' value for it? Unless we can suggest a safe value that the orchestration layer can use, or a way they can work it out, then they just wont use it.
Hi Dave, Sorry for my late response. Please see below:a) I think people would just kill the guest if it is broken. We can also change the default timeout value, for example 1 second, which is enough for the free page reporting.
b) How about changing it this way: if timeout happens, host sends a stop command to the guest, and makes virtio_balloon_poll_free_page_hints() "return" immediately (without getting the guest's acknowledge). The "return" basically goes back to the migration_thread function:
while (s->state == MIGRATION_STATUS_ACTIVE || s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { ... }migration_cancel sets the state to MIGRATION_CANCELLING, so it will stop the migration process.
c) This optimization needs the guest to report. If the guest is stopped, it wouldn't work. How about adding a check of "RUN_STATE" before going into the optimization?
d) Yes. Normally it is faster to wait for the guest to report all the free pages. Probably, we can just hardcode a value (e.g. 1s) for now (instead of making it configurable by users), this is used to handle the case that the guest is broken. What would you think?
Best, Wei
[Prev in Thread] | Current Thread | [Next in Thread] |