qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL 16/16] migration: fix crash in when incoming clie


From: Balamuruhan S
Subject: Re: [Qemu-devel] [PULL 16/16] migration: fix crash in when incoming client channel setup fails
Date: Thu, 28 Jun 2018 15:25:47 +0530
User-agent: Mutt/1.9.2 (2017-12-15)

On Wed, Jun 27, 2018 at 02:56:04PM +0200, Juan Quintela wrote:
> From: Daniel P. Berrangé <address@hidden>
> 
> The way we determine if we can start the incoming migration was
> changed to use migration_has_all_channels() in:
> 
>   commit 428d89084c709e568f9cd301c2f6416a54c53d6d
>   Author: Juan Quintela <address@hidden>
>   Date:   Mon Jul 24 13:06:25 2017 +0200
> 
>     migration: Create migration_has_all_channels
> 
> This method in turn calls multifd_recv_all_channels_created()
> which is hardcoded to always return 'true' when multifd is
> not in use. This is a latent bug...
> 
> ...activated in a following commit where that return result
> ends up acting as the flag to indicate whether it is possible
> to start processing the migration:
> 
>   commit 36c2f8be2c4eb0003ac77a14910842b7ddd7337e
>   Author: Juan Quintela <address@hidden>
>   Date:   Wed Mar 7 08:40:52 2018 +0100
> 
>     migration: Delay start of migration main routines
> 
> This means that if channel initialization fails with normal
> migration, it'll never notice and attempt to start the
> incoming migration regardless and crash on a NULL pointer.
> 
> This can be seen, for example, if a client connects to a server
> requiring TLS, but has an invalid x509 certificate:
> 
> qemu-system-x86_64: The certificate hasn't got a known issuer
> qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: 
> Assertion `mis->from_src_file' failed.
> 
>  #0  0x00007fffebd24f2b in raise () at /lib64/libc.so.6
>  #1  0x00007fffebd0f561 in abort () at /lib64/libc.so.6
>  #2  0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
>  #3  0x00007fffebd1d692 in  () at /lib64/libc.so.6
>  #4  0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized 
> out>) at migration/migration.c:386
>  #5  0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at util/coroutine-ucontext.c:116
>  #6  0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6
>  #7  0x0000000000000000 in  ()
> 
> To handle the non-multifd case, we check whether mis->from_src_file
> is non-NULL. With this in place, the migration server drops the
> rejected client and stays around waiting for another, hopefully
> valid, client to arrive.

Hi Juan,

I tried to perform multifd enabled migration and from qemu monitor
enabled mutlifd capability on source and target,
(qemu) migrate_set_capability x-multifd on
(qemu) migrate -d tcp:127.0.0.1:4444

The migration succeeds and its cool to have the feature :)
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: 
off return-path: off pause-before-switchover: off x-multifd: on dirty-bitmaps: 
off postcopy-blocktime: off late-block-activate: off
Migration status: completed
total time: 1051 milliseconds
downtime: 260 milliseconds
setup: 17 milliseconds
transferred ram: 8270 kbytes
throughput: 143.91 mbps
remaining ram: 0 kbytes
total ram: 4194560 kbytes
duplicate: 940989 pages
skipped: 0 pages
normal: 109635 pages
normal bytes: 438540 kbytes
dirty sync count: 3
page size: 4 kbytes


But when I just enable the multifd in souce but not in target

source:
x-multifd: on

target:
x-multifd: off

when migration is triggered with,
migrate -d tcp:127.0.0.1:4444 (port I used)

The VM is lost in source with Segmentation fault.

I think the correct way is to enable multifd on both source and target
similar to postcopy, but in this negative scenario we should consider
the right way of handling not to loose the VM instead error out
appropriately.

Please correct me if I miss something.


-- Bala
> 
> Signed-off-by: Daniel P. Berrangé <address@hidden>
> Message-Id: <address@hidden>
> Reviewed-by: Juan Quintela <address@hidden>
> Reviewed-by: Dr. David Alan Gilbert <address@hidden>
> Signed-off-by: Juan Quintela <address@hidden>
> ---
>  migration/migration.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index d075c27886..94d71f8b24 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -518,11 +518,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc)
>   */
>  bool migration_has_all_channels(void)
>  {
> +    MigrationIncomingState *mis = migration_incoming_get_current();
>      bool all_channels;
>  
>      all_channels = multifd_recv_all_channels_created();
>  
> -    return all_channels;
> +    return all_channels && mis->from_src_file != NULL;
>  }
>  
>  /*
> -- 
> 2.17.1
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]