|Subject:||[lwip-devel] lwIP 1.4.1 stable tcp connection stall|
|Date:||Wed, 9 Aug 2017 14:21:22 +0000|
First, thanks to everyone for the continued development and support of lwIP – it been great to see it so active the past few years. This purpose of this message is to notify lwIP 1.4.1 stable users of a problem, and to see if anyone (i.e. developers) knows the bug report that would have resolved this. The problem is, based on network traffic that I have been unable to pinpoint, the TCP outbound communication stalls. I am unsure in the debug logs what lwIP is trying to do with each tcp_output, but nothing goes out the wire. Packets come and go from the device as I can still open telnet, ping, and use a UDP protocol on the device. We use NO_SYS=1, a cooperative multi-tasking system, UDP and a single TCP connection. 1.4.1 was great for over 5 years.
We install systems on a local subnet (only a PC NIC and lwIP/Lantronix devices – anywhere from 2 to 10). A critical customer has been complaining for months about our devices disconnecting. We report a disconnect error when we stop getting repeating status messages back from all devices. I’d heard of this occurring intermittently over the years and we always wrote it off as electrical problems since we’re usually in a noisy environment. Until by chance, I connected my local subnet switch to our corporate network and I was seeing disconnects on all lwIP devices I have connected. I don’t know why. This customer must have the same traffic on the subnet that I see on the corporate network.
The first thing I did was upgrade to 2.0.2. Other than very few minor changes, everything builds and runs. The TCP send stalls are gone. I went back to lwIP 1.4.1 and they came back. Good, I had a test and a solution. We decided here the best approach is to try to patch 1.4.1 with the fix for this for the critical customer and then use a controlled rollout and test plan for lwIP 2.0.2 which means updating 9 of our lwIP devices. I spent about half a day checking the CHANGELOG and trying a few patches in the bug reports mentioning TCP and no change I made resolved the problem. The one mention for TCP stalling was with a new scaling window feature in lwIP 2.x. I would have thought a bug-fix regarding stalled TCP sends would be easy to find in the list – this is a big deal in a TCP/IP stack.
My question to developers is, does anyone recall a change that resolved TCP send stalling? And a note to lwIP 1.4.1 stable users – you should update to 2.x.
Thank you – best regards,
|[Prev in Thread]||Current Thread||[Next in Thread]|