|
From: | Fei Li |
Subject: | Re: [Qemu-devel] [PATCH RFC v7 5/9] migration: fix the multifd code when sending less channels |
Date: | Tue, 4 Dec 2018 15:32:42 +0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
Hi Juan, Kindly ping again. :) Have a nice day, thanks Fei On 11/12/2018 12:43 PM, Fei Li wrote:
Hi Juan, Kindly ping, as this multifd migration topic needs your suggestions. :) Have a nice day, thanks Fei On 11/03/2018 12:33 AM, Dr. David Alan Gilbert wrote:* Peter Xu (address@hidden) wrote:On Fri, Nov 02, 2018 at 11:00:24AM +0800, Fei Li wrote:On 11/02/2018 10:37 AM, Peter Xu wrote:On Thu, Nov 01, 2018 at 06:17:11PM +0800, Fei Li wrote:Set the migration state to "failed" instead of "setup" when failing to send packet via some channel.Could you please provide more information in the commit message? E.g., what will happen if without this patch? Will it crash the source or stall the source migration or others? Otherwise it's a bit hard for me to understand what's this patch for.Sorry for the inadequate description , I was intended to say that when failingto do the live migration using multifd, e.g. sending less channels, the src status displays "setup" when running `info migrate`. I assume we should tellusers that the "Migration status" is "failed" now (and along with the failure reason).The current src status when failed inmultifd_new_send_channel_async():(qemu) migrate_set_capability x-multifd on (qemu) migrate_set_parameter x-multifd-channels 4 (qemu) migrate -d tcp:192.168.190.98:4444(qemu) qemu-system-x86_64: failed in multifd_new_send_channel_async due to... (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: oncapabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: offblock: off return-path: off pause-before-switchover: off x-multifd: on dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off Migration status: setup total time: 0 millisecondsThanks for the information. I had a quick look. For now we do this: multifd_save_setup (without waiting for channels to be ready) create thread migration_thread (in thread) ram_save_setup multifd_send_sync_main (wait for the channels) The thing is that we didn't get the notification when one of the multifd channel is failed. IMHO instead of setting the global migration state in a per-channel function, we should just report the error upwards, then the main thread should decide how to change the state machine of the migration.Best to wait for Juan on that; I've got vague memories that reporting errors among the threads was a bit tricky. DaveAnd we have set it in migrate_set_error() after all so the main thread should be able to know somehow (though IMHO I'll even prefer to have a per-channel variable to keep the state of the channel, then the per-channel functions won't touch any globals which offers better isolation). I'm not sure how Juan thinks about it, but I'd prefer some work to provide such isolation and also some mechanism to allow the main thread to detect the per-channel errors not only during setup phase but also during the migration (e.g., when network is suddenly down). Then we don't touch any globals (e.g., we shouldn't call migrate_get_current in any per-channel function like multifd_new_send_channel_async).Normally I would prefer to not touch global states in feature specificcode path, but I'd like to know the problem more first... Thanks,Cc: Peter Xu <address@hidden> Signed-off-by: Fei Li <address@hidden> --- migration/ram.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index 4db3b3e8f4..c84d164fc8 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1072,6 +1072,7 @@ out:static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque){ MultiFDSendParams *p = opaque; + MigrationState *s = migrate_get_current(); QIOChannel *sioc = QIO_CHANNEL(qio_task_get_source(task)); Error *local_err = NULL;@@ -1083,6 +1084,7 @@ static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)if (multifd_save_cleanup(&local_err) != 0) { migrate_set_error(migrate_get_current(), local_err); }+ migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);} else { p->c = QIO_CHANNEL(sioc); qio_channel_set_delay(p->c, false); -- 2.13.7Regards,Regards, -- Peter Xu-- Dr. David Alan Gilbert / address@hidden / Manchester, UK
[Prev in Thread] | Current Thread | [Next in Thread] |