qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/1] migration: multifd live migration improvement


From: Dr. David Alan Gilbert
Subject: Re: [PATCH v2 0/1] migration: multifd live migration improvement
Date: Mon, 6 Dec 2021 19:54:41 +0000
User-agent: Mutt/2.1.3 (2021-09-10)

* Li Zhang (lizhang@suse.de) wrote:
> When testing live migration with multifd channels (8, 16, or a bigger number)
> and using qemu -incoming (without "defer"), if a network error occurs
> (for example, triggering the kernel SYN flooding detection),
> the migration fails and the guest hangs forever.
> 
> The test environment and the command line is as the following:
> 
> QEMU verions: QEMU emulator version 6.2.91 (v6.2.0-rc1-47-gc5fbdd60cf)
> Host OS: SLE 15  with kernel: 5.14.5-1-default
> Network Card: mlx5 100Gbps
> Network card: Intel Corporation I350 Gigabit (1Gbps)
> 
> Source:
> qemu-system-x86_64 -M q35 -smp 32 -nographic \
>         -serial telnet:10.156.208.153:4321,server,nowait \
>         -m 4096 -enable-kvm -hda /var/lib/libvirt/images/openSUSE-15.3.img \
>         -monitor stdio
> Dest:
> qemu-system-x86_64 -M q35 -smp 32 -nographic \
>         -serial telnet:10.156.208.154:4321,server,nowait \
>         -m 4096 -enable-kvm -hda /var/lib/libvirt/images/openSUSE-15.3.img \
>         -monitor stdio \
>         -incoming tcp:1.0.8.154:4000
> 
> (qemu) migrate_set_parameter max-bandwidth 100G
> (qemu) migrate_set_capability multifd on
> (qemu) migrate_set_parameter multifd-channels 16
> 
> The guest hangs when executing the command: migrate -d tcp:1.0.8.154:4000.
> 
> If a network problem happens, TCP ACK is not received by destination
> and the destination resets the connection with RST.
> 
> No.     Time    Source  Destination     Protocol        Length  Info
> 119     1.021169        1.0.8.153       1.0.8.154       TCP     1410    60166 
> → 4000 [PSH, ACK] Seq=65 Ack=1 Win=62720 Len=1344 TSval=1338662881 
> TSecr=1399531897
> No.     Time    Source  Destination     Protocol        Length  Info
> 125     1.021181        1.0.8.154       1.0.8.153       TCP     54      4000 
> → 60166 [RST] Seq=1 Win=0 Len=0
> 
> kernel log:
> [334520.229445] TCP: request_sock_TCP: Possible SYN flooding on port 4000. 
> Sending cookies.  Check SNMP counters.
> [334562.994919] TCP: request_sock_TCP: Possible SYN flooding on port 4000. 
> Sending cookies.  Check SNMP counters.
> [334695.519927] TCP: request_sock_TCP: Possible SYN flooding on port 4000. 
> Sending cookies.  Check SNMP counters.
> [334734.689511] TCP: request_sock_TCP: Possible SYN flooding on port 4000. 
> Sending cookies.  Check SNMP counters.
> [335687.740415] TCP: request_sock_TCP: Possible SYN flooding on port 4000. 
> Sending cookies.  Check SNMP counters.
> [335730.013598] TCP: request_sock_TCP: Possible SYN flooding on port 4000. 
> Sending cookies.  Check SNMP counters.

Should we document somewhere how to avoid that?  Is there something we
should be doing in the connection code to avoid it?

Dave

> There are two problems here:
> 1. On the send side, the main thread is blocked on qemu_thread_join and 
>    send threads are blocked on sendmsg
> 2. On receive side, the receive threads are blocked on qemu_sem_wait to 
>    wait for a semaphore. 
> 
> The patch is to fix the first problem, and the guest doesn't hang any more. 
> But there is no better solution to fix the second problem yet. 
> 
> Li Zhang (1):
>   multifd: Shut down the QIO channels to avoid blocking the send threads
>     when they are terminated.
> 
>  migration/multifd.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> -- 
> 2.31.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]