[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
答复: [question]vhost-user: atuo fix network link broken during migration
From: |
yangke (J) |
Subject: |
答复: [question]vhost-user: atuo fix network link broken during migration |
Date: |
Tue, 24 Mar 2020 11:08:47 +0000 |
> > We find an issue when host mce trigger openvswitch(dpdk) restart in
> > source host during guest migration,
>
>
> Did you mean the vhost-user netev was deleted from the source host?
The vhost-user netev was not deleted from the source host. I mean that:
in normal scenario, OVS(DPDK) begin to restart, then qemu_chr disconnect to OVS
and link status is set to link down; OVS(DPDK) started, then qemu_chr reconnect
to OVS and link status is set to link up. But in our scenario, before qemu_chr
reconnect to OVS, the VM migrate is finished. The link_down of frontend was
loaded from n->status in destination, it cause the network in gust never be up
again.
qemu_chr disconnect:
#0 vhost_user_write (msg=msg@entry=0x7fff59ecb2b0, fds=fds@entry=0x0,
fd_num=fd_num@entry=0, dev=0x295c730, dev=0x295c730)
at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:239
#1 0x00000000004e6bad in vhost_user_get_vring_base (dev=0x295c730,
ring=0x7fff59ecb510)
at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:497
#2 0x00000000004e2e88 in vhost_virtqueue_stop (dev=dev@entry=0x295c730,
vdev=vdev@entry=0x2ca36c0, vq=0x295c898, idx=0)
at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1036
#3 0x00000000004e45ab in vhost_dev_stop (hdev=hdev@entry=0x295c730,
vdev=vdev@entry=0x2ca36c0)
at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1556
#4 0x00000000004bc56a in vhost_net_stop_one (net=0x295c730,
dev=dev@entry=0x2ca36c0)
at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:326
#5 0x00000000004bcc3b in vhost_net_stop (dev=dev@entry=0x2ca36c0,
ncs=<optimized out>, total_queues=4)
at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:407
#6 0x00000000004b85f6 in virtio_net_vhost_status (n=n@entry=0x2ca36c0,
status=status@entry=7 '\a')
at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:177
#7 0x00000000004b869f in virtio_net_set_status (vdev=<optimized out>,
status=<optimized out>)
at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:243
#8 0x000000000073d00d in qmp_set_link (name=name@entry=0x2956d40 "hostnet0",
up=up@entry=false, errp=errp@entry=0x7fff59ecd718)
at net/net.c:1437
#9 0x00000000007460c1 in net_vhost_user_event (opaque=0x2956d40, event=4) at
net/vhost_user.c:217//qemu_chr_be_event
#10 0x0000000000574f0d in tcp_chr_disconnect (chr=0x2951a40) at qemu_char.c:3220
#11 0x000000000057511f in tcp_chr_hup (channel=<optimized out>, cond=<optimized
out>, opaque=<optimized out>) at qemu_char.c:3265
>
>
> > VM is still link down in frontend after migration, it cause the network in
> > VM never be up again.
> >
> > virtio_net_load_device:
> > /* nc.link_down can't be migrated, so infer link_down according
> > * to link status bit in n->status */
> > link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
> > for (i = 0; i < n->max_queues; i++) {
> > qemu_get_subqueue(n->nic, i)->link_down = link_down;
> > }
> >
> > guset: migrate begin -----> vCPU pause ---> vmsate load --->
> > migrate finish
> > ^ ^ ^
> > | | |
> > openvswitch in source host: begin to restart restarting started
> > ^ ^ ^
> > | | |
> > nc in frontend in source: link down link down link down
> > ^ ^ ^
> > | | |
> > nc in frontend in destination: link up link up link down
> > ^ ^ ^
> > | | |
> > guset network: broken broken broken
> > ^ ^ ^
> > | | |
> > nc in backend in source: link down link down link up
> > ^ ^ ^
> > | | |
> > nc in backend in destination: link up link up link up
> >
> > The link_down of frontend was loaded from n->status, n->status is link
> > down in source, so the link_down of frontend is true. The backend in
> > destination host is link up, but the frontend in destination host is link
> > down, it cause the network in gust never be up again until an guest cold
> > reboot.
> >
> > Is there a way to auto fix the link status? or just abort the migration in
> > virtio net device load?
>
>
> Maybe we can try to sync link status after migration?
>
> Thanks
In extreme scenario, after migration the OVS(DPDK) in source may be still not
started.
Our plan is to check the link state of backend when load the link_down of
frontend.
/* nc.link_down can't be migrated, so infer link_down according
* to link status bit in n->status */
- link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+ if (qemu_get_queue(n->nic)->peer->info->type ==
NET_CLIENT_DRIVER_VHOST_USER) {
+ link_down = (n->status & VIRTIO_NET_S_LINK_UP |
!qemu_get_queue(n->nic)->peer->link_down) == 0;
+ } else {
+ link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+ }
for (i = 0; i < n->max_queues; i++) {
qemu_get_subqueue(n->nic, i)->link_down = link_down;
}
Is good enough to auto fix the link status?
Thanks