qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] migration: check magic value for deciding the mapping of cha


From: manish.mishra
Subject: Re: [PATCH] migration: check magic value for deciding the mapping of channels
Date: Tue, 15 Nov 2022 12:37:52 +0530
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.4.1

Thanks Peter

On 14/11/22 10:21 pm, Peter Xu wrote:
Manish,

On Thu, Nov 03, 2022 at 11:47:51PM +0530, manish.mishra wrote:
Yes, but if we try to read early on main channel with tls enabled case it is an 
issue. Sorry i may not have put above comment cleary. I will try to put 
scenario step wise.
1. main channel is created and tls handshake is done for main channel.
2. Destionation side tries to read magic early on main channel in 
migration_ioc_process_incoming but it is not yet sent by source.
3. Source has written magic to main channel file buffer but it is not yet 
flushed, it is flushed first time in ram_save_setup, i mean data is sent on 
channel only if qemu file buffer is full or explicitly flushed.
4. Source side blocks on multifd_send_sync_main in ram_save_setup before 
flushing qemu file. But multifd_send_sync_main is blocked for sem_sync until 
handshake is done for multiFD channels.
5. Destination side is still waiting for reading magic on main channel, so 
unless we return from migration_ioc_process_incoming we can not accept new 
channel, so handshake of multiFD channel is blocked.
6. So basically source is blocked on multiFD channels handshake before sending 
data on main channel, but destination is blocked waiting for data before it can 
acknowledge multiFD channels and do handshake, so it kind of creates a deadlock 
situation.
Why is this issue only happening with TLS?  It sounds like it'll happen as
long as multifd enabled.


Actually this was happening with tls because with tls we do handshake, so a 
connection is assumed establised only after a tls handshake and we flush data 
from source only after all channels are established,  but with normal live 
migration even if connection is not accepted on destination side we can 
continue as we do not do any handshake. Basically in normal live migration a 
connection is assumed established if connect() call was successful even if it 
is not accepted/ack by destination, so that's why this deadlock was not 
hapening.


I'm also thinking whether we should flush in qemu_savevm_state_header() so
at least upgraded src qemu will always flush the headers if it never hurts.

yes sure Peter.

Thanks

Manish Mishra





reply via email to

[Prev in Thread] Current Thread [Next in Thread]