lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] Delayed ACK behavior - Solved


From: JM
Subject: Re: [lwip-users] Delayed ACK behavior - Solved
Date: Mon, 24 Aug 2009 16:22:35 -0700 (PDT)

I have finally discovered the cause of all my dropped packet and duplicate sequence issues.  It's really very simple, and I feel a little dumb: use a switch instead of a hub. 

I normally use switches, but was using a hub so I could monitor communications from my computer with Wireshark.  Despite practically no other traffic, it still caused collisions, I guess.  To make matters worse, Wireshark wasn't displaying bad packets; this wasn't a big surprise to me, but I underestimated how much of an impact this had on my troubleshooting.  This really caused some confusion, and I was chasing after the wrong things. 

After a little research here and some general Internet searches, it appears that full duplex and hubs don't mix.  I paid no attention to the duplex of the ethernet controller, but my guess is it's in full duplex right now, and if I switch modes to half duplex, a hub can be used.  I'm not sure how a computer does this; apparently it's automatic, seeing as how it doesn't seem to care if it's connected to a hub or switch. 

Anyway, my audio streaming works great now!  It's amazing...now I'm finally able to move on to other areas of need and leave behind my ethernet problems.



--- On Fri, 8/21/09, Kieran Mansley <address@hidden> wrote:

From: Kieran Mansley <address@hidden>
Subject: Re: [lwip-users] Delayed ACK behavior
To: "Mailing list for lwIP users" <address@hidden>
Date: Friday, August 21, 2009, 10:59 AM

On Fri, 2009-08-21 at 05:15 -0700, JM wrote:
> SYN-SENT: ackno 6558 pcb->snd_nxt 6558 unacked 6557
> tcp_receive: ACK for 6662, unacked->seqno 6558:6662
> tcp_receive: removing 6558:6662 from pcb->unacked

All good so far.

> tcp_input: packet discarded due to failing checksum 0xe4db

This must be the root of your (wider) problems: this packet (frame 6 or
7 I think) in the packet capture has got a bad checksum, so has most
likely been corrupted by the driver.  I would look into this in much
more detail and work out why it has got the wrong checksum.  If you can
set a breakpoint here do so and examine the packet buffer to compare to
the packet capture and see how they differ.

> tcp_receive: duplicate seqno 2971771236

This is where it gets weird.  This is most likely referring to frame 8
in the capture, the retransmission of frame 7 after it was dropped.  We
sent no ACK for the bad packet and this causes the other end to send the
retransmission in 8.  However, the stack is now apparently reporting
that this is a duplicate, which it can't be, because we dropped the
first one.

> tcp_input: packet discarded due to failing checksum 0x3b18
> tcp_receive: duplicate seqno 2971772867

same again.

Here's a hypothesis as to how this could be:

Suppose that your driver sometimes got the payloads of the packets mixed
up and attached them to the wrong packet headers.  To do this it must at
some point treat headers and payloads separately, e.g. with DMAs, or
putting them in separate buffers, or something like that, and then
associate them back together incorrectly.  If it sometimes got the
header of the Nth received frame with the payload of the (N+1)th
received frame, it could plausibly explain the behaviour seen.  When
receiving the Nth frame (frame 6 in this case) it would have the wrong
payload (from frame 7), and so fail the checksum and produce that
message.  Let us assume that frame 7 was then received properly.  Then
when frame 8 came in it would look like a duplicate.

The only evidence to the contrary is that I'd expect the lwIP stack to
send an ACK between frames 7 and 8 if this happened, but it doesn't.
Perhaps there is another problem that produces that behaviour, or
perhaps my hypothesis is wrong - it doesn't fit with the second
retransmission so well where we ACK frames 10 and 11 as good.  Even if
it is wrong, I'll bet the problem is something like that.

One pattern (although not very reliable on such a small sample) is that
there are 3 received packets between each retransmission.

Another possibility would be if your driver is just duplicating a packet
and passing it to the stack, but that wouldn't explain the bad
checksums.

I think the key to your problem is working out where that first failed
checksum comes from, and why.  I'm guessing that if you look at the
packet given to the stack, it will have (some) data from another frame
instead of the correct payload.

Kieran



_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users


reply via email to

[Prev in Thread] Current Thread [Next in Thread]