Re: [lwip-users] FTP-DATA exchange: TCP issues

From:

Jim Gibbons

Subject:

Date:

Fri, 04 Mar 2005 12:34:20 -0800

User-agent:

Mozilla Thunderbird 1.0 (Windows/20041206)

I was in error to suggest this problem. At the time that I saw this problem, the folks in question were running 0.6.3. In that version, the user was responsible for the timer, and the usual implementation just left it running, whether needed or not.

I can see what you mean about the use of the timer currently. It should get launched from the tcpip thread when needed, and that should preclude problems. Sorry about the confusion.

One other thing that had been an issue around that time were data cache coherency problems related to the ethernet DMA. We eventually turned off their data cache to avoid the confusion. Any chance that you have such a problem?

Tom C. Barker wrote:

Jim,

Not barging in at all Jim. On the contrary, thanks for the response. I can confirm

I am using lightweight protection and I will take a look at the timer call. The call

to the tcp timer is made only when the timer is _needed, though. What would be the

significance of the initial call to sys_timeout if there is no tcp connection\no need

for a tcp timer at startup? It would seem that a call to the tcp timer would result in

it firing once, finding no need to fire again and never reschedule.

Thanks again,

Tom
-----Original Message-----
From: address@hidden [mailto:address@hidden]On Behalf Of Jim Gibbons
Sent: Friday, March 04, 2005 10:51 AM
To: Mailing list for lwIP users
Subject: Re: [lwip-users] FTP-DATA exchange: TCP issues

Pardon me again for barging in. Keiran's analysis, particularly regarding an unmotivated retransmit, sounded very familiar. I had a problem like this at one of my clients. We changed two things and it then went away.

First, we found and fixed a problem with the tcp_tmr. It was running in the wrong task context. It must run in the tcpip thread. The usual method for doing this is to make the initial call to sys_timeout from within the callback function that executes when tcpip initialization is done.

Second, we found that we weren't using the lightweight protection option that I mentioned to you earlier.

I think it was actually the first thing that was causing the retransmit problem, but we never found out for sure. It's really difficult to track down resource conflicts. When the problem went away, we stopped working on it.

Tom C. Barker wrote:
Thanks for your analysis Kieran. Forgive my assessment of 
what ACKs are what: I was speaking of the multiple ACKs 
the client sends back. ".65", the problem node, is in fact 
the lwIP ftp server.

I have all my DEBUG statements on and find that I never get
a tcp_enqueue of the missing packet. It just skips over it.
My only priority is this issue right now so if you or anyone
has any ideas of what I can watch for I open to ideas. Meanwhile
I'm crafting a bit-patterned file to help identify where the 
problem is occurring.

Tom

-----Original Message-----
From: address@hidden
[mailto:address@hidden]On Behalf
Of Kieran Mansley
Sent: Friday, March 04, 2005 1:29 AM
To: Mailing list for lwIP users
Subject: Re: [lwip-users] FTP-DATA exchange: TCP issues


On Thu, 2005-03-03 at 09:54 -0800, Tom C. Barker wrote:
  
Hello,

Maybe to short-circuit this issue, I am working with 
0.7.2 and am in the process of moving to 1.1.0 so if 
the following problem resembles a bug prior to 1.1.0,
please let me know.

In testing an ftp implementation where I will occasionally 
successfully transfer a 400k file, I have come across a
consistently reproducible issue where my lwIP ftp server 
seems to have dropped an ACK in that according to the 
attached (truncated-packets) ethereal file, the packet on 
line 249 should have ACK'd 264364, but instead ACKs 267284. 
The rest of the (doomed) transaction is spent trying to 
shoehorn in a few packets to the client's unacked queue. 
    
Your description doesn't seem to match the trace that you've attached.
There is no packet there that ACKs 267284.  

However, there is clearly something going wrong in that data transfer.
The problem seems to me to start with packet 245, which (i) is a
retransmission (of packet 242) when none seems necessary and (ii)
doesn't have the same payload as the earlier transmission of the same
data.  Looks to me like packet 245 has got the wrong sequence number on
it, and it is in fact the payload of the next in-order packet.

Something similar happens with packet 244 and 247: 247 is a
retransmission of 244, but would not seem to be necessary, and this time
they both have the same payload.

What's more worrying is that the ".65" node then fails to retransmit the
correct data when it should: it gets many duplicate acknowledgements for
264364, which should lead it to retransmit that packet, but it refuses.

I can't explain this is in full, but hopefully that will give you some
clues about what might be wrong.  You could compare the captured
payloads against the file that is being transferred to check my theory
about 245 having the wrong sequence number.

Kieran



_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users


_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users
  
--

Jim Gibbons
address@hidden

Gibbons and Associates, Inc.
TEL: (408) 984-1441

900 Lafayette, Suite 704, Santa Clara, CA
FAX: (408) 247-6395
_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users

--
E-mail signature

Jim Gibbons	address@hidden
Gibbons and Associates, Inc.	TEL: (408) 984-1441
900 Lafayette, Suite 704, Santa Clara, CA	FAX: (408) 247-6395