[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PULL 16/16] migration: fix crash in when incoming clie
From: |
Balamuruhan S |
Subject: |
Re: [Qemu-devel] [PULL 16/16] migration: fix crash in when incoming client channel setup fails |
Date: |
Thu, 28 Jun 2018 15:25:47 +0530 |
User-agent: |
Mutt/1.9.2 (2017-12-15) |
On Wed, Jun 27, 2018 at 02:56:04PM +0200, Juan Quintela wrote:
> From: Daniel P. Berrangé <address@hidden>
>
> The way we determine if we can start the incoming migration was
> changed to use migration_has_all_channels() in:
>
> commit 428d89084c709e568f9cd301c2f6416a54c53d6d
> Author: Juan Quintela <address@hidden>
> Date: Mon Jul 24 13:06:25 2017 +0200
>
> migration: Create migration_has_all_channels
>
> This method in turn calls multifd_recv_all_channels_created()
> which is hardcoded to always return 'true' when multifd is
> not in use. This is a latent bug...
>
> ...activated in a following commit where that return result
> ends up acting as the flag to indicate whether it is possible
> to start processing the migration:
>
> commit 36c2f8be2c4eb0003ac77a14910842b7ddd7337e
> Author: Juan Quintela <address@hidden>
> Date: Wed Mar 7 08:40:52 2018 +0100
>
> migration: Delay start of migration main routines
>
> This means that if channel initialization fails with normal
> migration, it'll never notice and attempt to start the
> incoming migration regardless and crash on a NULL pointer.
>
> This can be seen, for example, if a client connects to a server
> requiring TLS, but has an invalid x509 certificate:
>
> qemu-system-x86_64: The certificate hasn't got a known issuer
> qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co:
> Assertion `mis->from_src_file' failed.
>
> #0 0x00007fffebd24f2b in raise () at /lib64/libc.so.6
> #1 0x00007fffebd0f561 in abort () at /lib64/libc.so.6
> #2 0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
> #3 0x00007fffebd1d692 in () at /lib64/libc.so.6
> #4 0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized
> out>) at migration/migration.c:386
> #5 0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>,
> i1=<optimized out>) at util/coroutine-ucontext.c:116
> #6 0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6
> #7 0x0000000000000000 in ()
>
> To handle the non-multifd case, we check whether mis->from_src_file
> is non-NULL. With this in place, the migration server drops the
> rejected client and stays around waiting for another, hopefully
> valid, client to arrive.
Hi Juan,
I tried to perform multifd enabled migration and from qemu monitor
enabled mutlifd capability on source and target,
(qemu) migrate_set_capability x-multifd on
(qemu) migrate -d tcp:127.0.0.1:4444
The migration succeeds and its cool to have the feature :)
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off
compress: off events: off postcopy-ram: off x-colo: off release-ram: off block:
off return-path: off pause-before-switchover: off x-multifd: on dirty-bitmaps:
off postcopy-blocktime: off late-block-activate: off
Migration status: completed
total time: 1051 milliseconds
downtime: 260 milliseconds
setup: 17 milliseconds
transferred ram: 8270 kbytes
throughput: 143.91 mbps
remaining ram: 0 kbytes
total ram: 4194560 kbytes
duplicate: 940989 pages
skipped: 0 pages
normal: 109635 pages
normal bytes: 438540 kbytes
dirty sync count: 3
page size: 4 kbytes
But when I just enable the multifd in souce but not in target
source:
x-multifd: on
target:
x-multifd: off
when migration is triggered with,
migrate -d tcp:127.0.0.1:4444 (port I used)
The VM is lost in source with Segmentation fault.
I think the correct way is to enable multifd on both source and target
similar to postcopy, but in this negative scenario we should consider
the right way of handling not to loose the VM instead error out
appropriately.
Please correct me if I miss something.
-- Bala
>
> Signed-off-by: Daniel P. Berrangé <address@hidden>
> Message-Id: <address@hidden>
> Reviewed-by: Juan Quintela <address@hidden>
> Reviewed-by: Dr. David Alan Gilbert <address@hidden>
> Signed-off-by: Juan Quintela <address@hidden>
> ---
> migration/migration.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index d075c27886..94d71f8b24 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -518,11 +518,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
> */
> bool migration_has_all_channels(void)
> {
> + MigrationIncomingState *mis = migration_incoming_get_current();
> bool all_channels;
>
> all_channels = multifd_recv_all_channels_created();
>
> - return all_channels;
> + return all_channels && mis->from_src_file != NULL;
> }
>
> /*
> --
> 2.17.1
>
>
- [Qemu-devel] [PULL 06/16] migration: Multifd channels always wait on the sem, (continued)
- [Qemu-devel] [PULL 06/16] migration: Multifd channels always wait on the sem, Juan Quintela, 2018/06/27
- [Qemu-devel] [PULL 07/16] migration: Add block where to send/receive packets, Juan Quintela, 2018/06/27
- [Qemu-devel] [PULL 08/16] migration: Synchronize multifd threads with main thread, Juan Quintela, 2018/06/27
- [Qemu-devel] [PULL 10/16] migration: Create ram_save_multifd_page, Juan Quintela, 2018/06/27
- [Qemu-devel] [PULL 09/16] migration: Create multifd_bytes ram_counter, Juan Quintela, 2018/06/27
- [Qemu-devel] [PULL 11/16] migration: Start sending messages, Juan Quintela, 2018/06/27
- [Qemu-devel] [PULL 12/16] migration: Wait for blocking IO, Juan Quintela, 2018/06/27
- [Qemu-devel] [PULL 13/16] migration: Remove not needed semaphore and quit, Juan Quintela, 2018/06/27
- [Qemu-devel] [PULL 14/16] migration: Stop sending whole pages through main channel, Juan Quintela, 2018/06/27
- [Qemu-devel] [PULL 16/16] migration: fix crash in when incoming client channel setup fails, Juan Quintela, 2018/06/27
- Re: [Qemu-devel] [PULL 16/16] migration: fix crash in when incoming client channel setup fails,
Balamuruhan S <=
- [Qemu-devel] [PULL 15/16] postcopy: drop ram_pages parameter from postcopy_ram_incoming_init(), Juan Quintela, 2018/06/27
- Re: [Qemu-devel] [PULL 00/16] Migration, Peter Maydell, 2018/06/28