Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0

From:	Chris Webb
Subject:	Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0
Date:	Tue, 3 Apr 2012 13:42:18 +0100
User-agent:	Mutt/1.5.20 (2009-06-14)

Stefan Hajnoczi <address@hidden> writes:

> In a case like this it might be most effective to catch a VM in the
> bad state and then go in with gdb to see what is broken.  The basic
> approach would be putting breakpoints on the e1000 device model's
> transmit/receive paths to see if the guest is giving us packets and
> whether the tap device is transmitting/receiving.  If guest and host
> appear to be working then QEMU's e1000 model must be in a bad state
> and it's a question of looking at the tx/rx rings and other hardware
> emulation state to figure out what went wrong.

Hi Stefan. I tried setting a breakpoint on start_xmit, but the qemu blew up
when I hit it:

(gdb) break /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c:start_xmit
Function "start_xmit" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) break /home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c:528       
Breakpoint 1 at 0x46dcd6: file 
/home/root/packages/qemu-kvm-1.0/src-hrw66F/hw/e1000.c, line 528.
(gdb) cont
Continuing.

Program terminated with signal SIGTRAP, Trace/breakpoint trap.
The program no longer exists.

I assume this is some subtlety with breakpointing threaded code?

However, along these lines, I note that the guest appears to have received
packets, though this count is stuck at 1993 bytes. The TX count marches upwards
as I ping outbound from the guest.

If I attach a tcpdump to tap1 on the host, I see the ARP requests going out and
apparently no reply:

0024# tcpdump -i tap1
tcpdump: WARNING: tap1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tap1, link-type EN10MB (Ethernet), capture size 65535 bytes
12:08:35.654992 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:08:36.654976 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:08:37.654975 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:08:38.670933 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:08:39.670922 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28
12:08:40.670908 ARP, Request who-has 84.45.8.129 tell 84.45.8.242, length 28

Looking on br0, I do seem to see the replies:

12:12:53.509471 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
84.45.8.129 tell 84.45.8.242, length 28
12:12:53.509914 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 
00:13:c3:35:a6:42 (oui Unknown), length 46
12:12:54.509455 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
84.45.8.129 tell 84.45.8.242, length 28
12:12:54.509875 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 
00:13:c3:35:a6:42 (oui Unknown), length 46
12:12:55.509447 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
84.45.8.129 tell 84.45.8.242, length 28
12:12:55.509878 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 
00:13:c3:35:a6:42 (oui Unknown), length 46
12:12:56.525424 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
84.45.8.129 tell 84.45.8.242, length 28
12:12:56.525854 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 
00:13:c3:35:a6:42 (oui Unknown), length 46
12:12:57.525408 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 
84.45.8.129 tell 84.45.8.242, length 28
12:12:57.525837 ARP, Ethernet (len 6), IPv4 (len 4), Reply 84.45.8.129 is-at 
00:13:c3:35:a6:42 (oui Unknown), length 46

but they never get to tap1 despite STP being disabled and no bridge port
filtering:

  # ebtables -L
  Bridge table: filter

  Bridge chain: INPUT, entries: 0, policy: ACCEPT

  Bridge chain: FORWARD, entries: 0, policy: ACCEPT

  Bridge chain: OUTPUT, entries: 0, policy: ACCEPT

  # brctl show br0
  bridge name     bridge id               STP enabled     interfaces
  br0             8000.002590224ffa       no              eth0


This looks uncannily like a kernel problem doesn't it? However, remove the
-usbdevice tablet, and it goes away, which is truly weird! I've just done a
hundred successful reboots without it once again to confirm to myself that I'm
definitely not imagining that behaviour.

> Have you tried unloading the e1000 kernel module inside the guest and
> then modprobing it again?  Does this "fix" the issue?

Hadn't thought of that, but no, it apparently has no effect. It's still broken
after I rmmod it, modprobe it again, and reconfigure the networking.

Cheers,

Chris.

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Chris Webb, 2012/04/02
- Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Stefan Hajnoczi, 2012/04/03
  - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Chris Webb, 2012/04/03
    - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Stefan Hajnoczi, 2012/04/03
    - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Chris Webb <=
    - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Stefan Hajnoczi, 2012/04/03
    - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Chris Webb, 2012/04/03
    - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Stefan Hajnoczi, 2012/04/03
    - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Chris Webb, 2012/04/03
    - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Stefan Hajnoczi, 2012/04/11
    - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Chris Webb, 2012/04/12
    - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Stefan Hajnoczi, 2012/04/12
    - Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0, Chris Webb, 2012/04/20

Prev by Date: [Qemu-devel] [PATCH 16/25] qdev: switch property accessors to fixed-width visitor interfaces
Next by Date: [Qemu-devel] qemu.git v1.0-1852-gf05f6b4 regression - Bus error, followed by core dump
Previous by thread: Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0
Next by thread: Re: [Qemu-devel] Intermittent e1000 failure on qemu-kvm 1.0
Index(es):
- Date
- Thread