[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread t
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever |
Date: |
Mon, 29 Nov 2021 11:20:08 +0000 |
User-agent: |
Mutt/2.1.3 (2021-09-10) |
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Fri, Nov 26, 2021 at 04:31:53PM +0100, Li Zhang wrote:
> > When doing live migration with multifd channels 8, 16 or larger number,
> > the guest hangs in the presence of the network errors such as missing TCP
> > ACKs.
> >
> > At sender's side:
> > The main thread is blocked on qemu_thread_join, migration_fd_cleanup
> > is called because one thread fails on qio_channel_write_all when
> > the network problem happens and other send threads are blocked on sendmsg.
> > They could not be terminated. So the main thread is blocked on
> > qemu_thread_join
> > to wait for the threads terminated.
>
> Isn't the right answer here to ensure we've called 'shutdown' on
> all the FDs, so that the threads get kicked out of sendmsg, before
> trying to join the thread ?
I agree a timeout is wrong here; there is no way to get a good timeout
value.
However, I'm a bit confused - we should be able to try a shutdown on the
receive side using the 'yank' command. - that's what it's there for; Li
does this solve your problem?
multifd_load_cleanup already kicks sem_sync before trying to do a
thread_join - so have we managed to trigger that on the receive side?
Dave
>
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
- [PATCH 0/2] migration: multifd live migration improvement, Li Zhang, 2021/11/26
- [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Li Zhang, 2021/11/26
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Daniel P . Berrangé, 2021/11/26
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Li Zhang, 2021/11/26
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Daniel P . Berrangé, 2021/11/26
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Li Zhang, 2021/11/26
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Daniel P . Berrangé, 2021/11/26
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Li Zhang, 2021/11/26
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever,
Dr. David Alan Gilbert <=
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Li Zhang, 2021/11/29
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Dr. David Alan Gilbert, 2021/11/29
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Li Zhang, 2021/11/29
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Daniel P . Berrangé, 2021/11/29
- Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Dr. David Alan Gilbert, 2021/11/29
Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever, Juan Quintela, 2021/11/26
[PATCH 2/2] migration: Set the socket backlog number to reduce the chance of live migration failure, Li Zhang, 2021/11/26