qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] migration: check magic value for deciding the mapping of cha


From: manish.mishra
Subject: Re: [PATCH] migration: check magic value for deciding the mapping of channels
Date: Fri, 4 Nov 2022 00:24:25 +0530
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.4.1


On 03/11/22 11:47 pm, manish.mishra wrote:

On 03/11/22 11:27 pm, Daniel P. Berrangé wrote:
On Thu, Nov 03, 2022 at 11:06:23PM +0530, manish.mishra wrote:
On 03/11/22 10:57 pm, Daniel P. Berrangé wrote:
On Thu, Nov 03, 2022 at 10:04:54PM +0530, manish.mishra wrote:
On 03/11/22 2:59 pm, Daniel P. Berrangé wrote:
On Thu, Nov 03, 2022 at 02:50:25PM +0530, manish.mishra wrote:
On 01/11/22 9:15 pm, Daniel P. Berrangé wrote:
On Tue, Nov 01, 2022 at 09:10:14PM +0530, manish.mishra wrote:
On 01/11/22 8:21 pm, Daniel P. Berrangé wrote:
On Tue, Nov 01, 2022 at 02:30:29PM +0000, manish.mishra wrote:
diff --git a/migration/migration.c b/migration/migration.c
index 739bb683f3..f4b6f278a9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -733,31 +733,40 @@ void migration_ioc_process_incoming(QIOChannel *ioc, 
Error **errp)
      {
          MigrationIncomingState *mis = migration_incoming_get_current();
          Error *local_err = NULL;
-    bool start_migration;
          QEMUFile *f;
+    bool default_channel = true;
+    uint32_t channel_magic = 0;
+    int ret = 0;
-    if (!mis->from_src_file) {
-        /* The first connection (multifd may have multiple) */
+    if (migrate_use_multifd() && !migration_in_postcopy()) {
+        ret = qio_channel_read_peek_all(ioc, (void *)&channel_magic,
+ sizeof(channel_magic), &local_err);
+
+        if (ret != 1) {
+            error_propagate(errp, local_err);
+            return;
+        }
....and thus this will fail for TLS channels AFAICT.
Yes, thanks for quick review Daniel. You pointed this earlier too, sorry missed 
it, will put another check !migrate_use_tls() in V2.
But we need this problem fixed with TLS too, so just excluding it
isn't right. IMHO we need to modify the migration code so we can
read the magic earlier, instead of peeking.


With regards,
Daniel
Hi Daniel, I was trying tls migrations. What i see is that tls session
creation does handshake. So if we read ahead in ioc_process_incoming
for default channel. Because client sends magic only after multiFD
channels are setup, which too requires tls handshake.
By the time we get to migrate_ioc_process_incoming, the TLS handshake
has already been performed.

migration_channel_process_incoming
       -> migration_ioc_process_incoming

vs

migration_channel_process_incoming
       -> migration_tls_channel_process_incoming
           -> migration_tls_incoming_handshake
         -> migration_channel_process_incoming
             ->  migration_ioc_process_incoming

Yes sorry i thought we block on source side till handshake is done but that is 
not true. I checked then why that deadlock is happening. So this where the 
dealock is happening.

static int ram_save_setup(QEMUFile *f, void *opaque) {
+
+
      ram_control_before_iterate(f, RAM_CONTROL_SETUP);
      ram_control_after_iterate(f, RAM_CONTROL_SETUP);

      ret =  multifd_send_sync_main(f);
      if (ret < 0) {
          return ret;
      }

      qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
      qemu_fflush(f);

      return 0;
}

Now if we block in migration_ioc_process_incoming for reading magic
value from channel, which is actually sent by client when this
qemu_fflush is done. Before this qemu_fflush we wait for
multifd_send_sync_main which actually requires that tls handshake is
done for multiFD channels as it blocks on sem_sync which posted by
multifd_send_thread which is called after handshake||. But then on
destination side we are blocked in migration_ioc_process_incoming()
waiting to read something from default channel hence handshake for
multiFD channels can not happen. This to me looks unresolvable
whatever way we try to manipulate stream until we do some changes
on source side.
The TLS handshake is already complete when migration_ioc_process_incoming
is blocking on read.

Regardless of which channel we're talking about, thue TLS handshake is
always performed & finished before we try to either send or recv any
data.
Yes Daniel, agree on that, in this case tls handshake is done for
default channel so we went in migration_ioc_process_incoming for
default channel. But if we try to read some data there, it does not
get because data on default channel is not yet flushed by source.
 From source side data in flushed in above function i pointed. Which
blocks for multiFD channels to be setup with handshake, before
flushing data. I mean data is sent only when buffer is full or
explicitly flushed, till then it is just in buffer. But multiFD
handshake can not complete until we return from
migration_ioc_process_incoming for default channel which infintely
waits because nothing is sent yet on channel.
On the source side, if we're in ram_save_setup then the TLS
handshake is already complete for the main channel. In fact
I think the TLS handshake should act as a serialization
point that prevents the entire bug. It should be guaranteed
that the main channel is fully received by the dest, and
transferring data, before we even get to establishing the
multifd channels.


Yes, Daniel, tls handshake could make things serielized, but issue is that from 
source side handshake is done in background with another thread we do not 
actually block, so it is still possible that multiFD connection is accepted 
first on destination side.


Oh I see now, tls handshake is done with different thread only for multiFD 
channel, for main channel handshake is a blocker, so agree this bug should not 
be possible with tls. So does current patch works with another change that we 
do not do read peek for tls cases and fall back to older way. Normal read ahead 
anyway does not work with tls for earlier reason of deadlock.

Thanks

Manish Mishra



All we need do is read the magic bytes early, regardless of
whether its plain or TLS channel, and it should to the right
thing AFAICT.


Yes, but if we try to read early on main channel with tls enabled case it is an 
issue. Sorry i may not have put above comment cleary. I will try to put 
scenario step wise.
1. main channel is created and tls handshake is done for main channel.
2. Destionation side tries to read magic early on main channel in 
migration_ioc_process_incoming but it is not yet sent by source.
3. Source has written magic to main channel file buffer but it is not yet 
flushed, it is flushed first time in ram_save_setup, i mean data is sent on 
channel only if qemu file buffer is full or explicitly flushed.
4. Source side blocks on multifd_send_sync_main in ram_save_setup before 
flushing qemu file. But multifd_send_sync_main is blocked for sem_sync until 
handshake is done for multiFD channels.
5. Destination side is still waiting for reading magic on main channel, so 
unless we return from migration_ioc_process_incoming we can not accept new 
channel, so handshake of multiFD channel is blocked.
6. So basically source is blocked on multiFD channels handshake before sending 
data on main channel, but destination is blocked waiting for data before it can 
acknowledge multiFD channels and do handshake, so it kind of creates a deadlock 
situation.

I am still not sure if i could put it clearly :)

Thanks

Manish Mishra

With regards,
Daniel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]