qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
Date: Thu, 26 Dec 2013 15:48:15 +0200

On Thu, Dec 26, 2013 at 11:51:04PM +1100, Alexey Kardashevskiy wrote:
> On 12/26/2013 09:49 PM, Michael S. Tsirkin wrote:
> > On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote:
> >> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
> >>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
> >>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
> >>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
> >>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
> >>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy 
> >>>>>>>>> wrote:
> >>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> >>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> >>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy 
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>> Hi!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine 
> >>>>>>>>>>>>> - it does
> >>>>>>>>>>>>> not survive reboot of the guest.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Steps to reproduce:
> >>>>>>>>>>>>> 1. boot the guest
> >>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
> >>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
> >>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not work 
> >>>>>>>>>>>>> at all.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The test is:
> >>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>>>>>> ping 172.20.1.23
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows 
> >>>>>>>>>>>>> no trafic
> >>>>>>>>>>>>> coming from the guest. If to compare how it works before and 
> >>>>>>>>>>>>> after reboot,
> >>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and 
> >>>>>>>>>>>>> receives the
> >>>>>>>>>>>>> response and it does the same after reboot but the answer does 
> >>>>>>>>>>>>> not come.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So you see the arp packet in guest but not in host?
> >>>>>>>>>>>
> >>>>>>>>>>> Yes.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
> >>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
> >>>>>>>>>>>
> >>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all 
> >>>>>>>>>>> what is
> >>>>>>>>>>> happening there.
> >>>>>>>>>>>
> >>>>>>>>>>> One more hint - if I boot the guest and the guest does not bring 
> >>>>>>>>>>> eth0 up
> >>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then 
> >>>>>>>>>>> eth0 will
> >>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ifconfig eth0 172.20.1.2 down
> >>>>>>>>>>> sleep 210
> >>>>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>>>> ping 172.20.1.23
> >>>>>>>>>>>
> >>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to 
> >>>>>>>>>>> reproduce.
> >>>>>>>>>>>
> >>>>>>>>>>> No "vhost" == always works. The only difference I can see here is 
> >>>>>>>>>>> vhost's
> >>>>>>>>>>> thread which may get suspended if not used for a while after the 
> >>>>>>>>>>> start and
> >>>>>>>>>>> does not wake up but this is almost a blind guess.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Yet another clue - this host kernel patch seems to help with the 
> >>>>>>>>>> guest
> >>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>>>>>>>>> index 69068e0..5e67650 100644
> >>>>>>>>>> --- a/drivers/vhost/vhost.c
> >>>>>>>>>> +++ b/drivers/vhost/vhost.c
> >>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, 
> >>>>>>>>>> struct
> >>>>>>>>>> vhost_work *work)
> >>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
> >>>>>>>>>>                 work->queue_seq++;
> >>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>>>>> -               wake_up_process(dev->worker);
> >>>>>>>>>>         } else {
> >>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>>>>>         }
> >>>>>>>>>> +       wake_up_process(dev->worker);
> >>>>>>>>>>  }
> >>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
> >>>>>>>>
> >>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
> >>>>>>>> happens to cause races.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Since it's all around startup,
> >>>>>>>>> you can try kicking the host eventfd in
> >>>>>>>>> vhost_net_start.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> How exactly? This did not help. Thanks.
> >>>>>>>>
> >>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >>>>>>>> index 006576d..407ecf2 100644
> >>>>>>>> --- a/hw/net/vhost_net.c
> >>>>>>>> +++ b/hw/net/vhost_net.c
> >>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, 
> >>>>>>>> NetClientState
> >>>>>>>> *ncs,
> >>>>>>>>          if (r < 0) {
> >>>>>>>>              goto err;
> >>>>>>>>          }
> >>>>>>>> +
> >>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
> >>>>>>>> +        struct vhost_vring_file file = {
> >>>>>>>> +            .index = i
> >>>>>>>> +        };
> >>>>>>>> +        file.fd =
> >>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
> >>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
> >>>>>>>
> >>>>>>> No, this sets the notifier, it does not kick.
> >>>>>>> To kick you write 1 there:
> >>>>>>>       uint6_t  v = 1;
> >>>>>>>       write(fd, &v, sizeof v);
> >>>>>>
> >>>>>>
> >>>>>> Please, be precise. How/where do I get that @fd? Is what I do correct?
> >>>>>
> >>>>> Yes.
> >>>>>
> >>>>>> What
> >>>>>> is uint6_t - uint8_t or uint16_t (neither works)?
> >>>>>
> >>>>> Sorry, should have been uint64_t.
> >>>>
> >>>>
> >>>> Oh, that I missed :-) Anyway, this does not make any difference. Is there
> >>>> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
> >>>> it signals from the user space does not work...
> >>>
> >>> You can run a timer in qemu and signal the eventfd from there
> >>> periodically.
> >>>
> >>> Just to restate, tcpdump in guest shows that guest sends arp packet,
> >>> but tcpdump in host on tun device does not show any packets?
> >>
> >>
> >> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
> >> something is happening on the host's TAP - the guest sends ARP request, the
> >> response is visible on the TAP interface but not in the guest.
> > 
> > Okay. So problem is on host to guest path then.
> > Things to try:
> > 
> > 1. trace handle_rx [vhost_net]
> > 2. trace tun_put_user [tun]
> > 3. I suspect some host bug in one of the features.
> > Let's try to disable some flags with device property:
> > you can get the list by doing:
> > ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
> > Things I would try turning off is guest offloads (ones that start with 
> > guest_)
> > event_idx,any_layout,mq.
> > Turn them all off, if it helps try to find the one that helped.
> 
> 
> Heh. It still would be awesome to read basics about this vhost thing as I
> am debugging blindly :)
> 
> Regarding your suggestions.
> 
> 1. I put "printk" in handle_rx and tun_put_user.

Fine, though it's easier with ftrace  http://lwn.net/Articles/370423/
look for function filtering.

> handle_rx stopped being called after 2:40 from the guest start,
> tun_put_user stopped after 0:20 from the guest start. Accuracy is 5 seconds.
> If I bring the guest's eth0 up while handle_rx is still printing, it works,
> i.e. tun_put_user is called a lot. Once handle_rx stopped, nothing can
> bring eth0 back to live.

OK so what should happen is that handle rx is called
when you bring eth0 up.
Do you see this?
The way it is supposed to work is this:

vhost_net_enable_vq calls vhost_poll_start then
this calls mask = file->f_op->poll(file, &poll->table)
on the tun file.
this calls tun_chr_poll.
at this point there are packets queued on tun already
so that returns POLLIN | POLLRDNORM;
this calls vhost_poll_wakeup and that checks mask against
the key.
key is POLLIN so vhost_poll_queue is called.
this in turn calls vhost_work_queue
work list is either empty then we wake up worker
or it's not empty  then worker is running out job anyway.
this will then invoke handle_rx_net.


> 2. This is exactly how I run QEMU now. I basically set "off" for every
> on/off parameters. This did not change anything.
> 
> ./qemu-system-ppc64 \
>       -enable-kvm \
>       -m 2048 \
>       -L qemu-ppc64-bios/ \
>       -machine pseries \
>       -trace events=qemu_trace_events \
>       -kernel vml312 \
>       -append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \
>       -nographic \
>       -vga none \
>       -nodefaults \
>       -chardev stdio,id=id0,signal=off,mux=on \
>       -device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
>       -mon id=id2,chardev=id0,mode=readline \
>       -netdev
> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>       -device
> virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer,ioeventfd=off,\
> indirect_desc=off,event_idx=off,any_layout=off,csum=off,guest_csum=off,\
> gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,\
> host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,\
> status=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_rx_extra=off,\
> ctrl_mac_addr=off,ctrl_guest_offloads=off,mq=off,multifunction=off,\
> command_serr_enable=off \
>       -netdev user,id=id5,hostfwd=tcp::5000-:22 \
>       -device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01
> 

Yes this looks like some kind of race.

> 
> -- 
> Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]