Re: [lwip-users] tcp_active_pcbs corrupt after resetting connection ???

From: Simon Goldschmidt
Subject: Re: [lwip-users] tcp_active_pcbs corrupt after resetting connection ???
Date: Wed, 27 Mar 2019 22:12:52 +0100
On 27.03.19 22:00, Terence Darwen wrote:
Hi Simon - Thanks for the reply.  I've made my own replies below:

 > Hi, I'm using lwIP 1.41 with a Texas Instruments Tiva Launchpad
 > board (the TM4C1294).
 >>Now that's a really old version of lwIP!

Yes, unfortunately 1.41 is the version that is packaged in TI's latest
release of Tivaware and not a more recent version.

 > [..]
 > This code is run in the lwIPHostTimerHandler.  Which of course is called
 > from the Ethernet interrupt handler.

 >>Wait a minute, "of course"? Have you read this (mostly valid for 1.4.x
 >>as well):

Yes, I've been working with lwIP on the Tiva for some time now and have
been aware of these instructions for quite a while and have taken great
care to make sure all lwIP calls (all tcp_* calls) are made on a single
"thread", this being the Tiva's Ethernet interrupt.  For example, when
needing to send new data outside of the Ethernet interrupt I place the
data in a thread safe queue.  This queue is then processed (i.e. the
data from the queue is sent using tcp_wtite and tcp_output) only during
the Tiva's Ethernet interrupt.

 > Based on other examples, I see no problem with this.

 >>Ehrm, based on which examples?

TI's Tivaware contains examples of using lwIP where the
lwIPHostTimerHandler function makes tcp_ calls.  The examples
have lwIPEthernetIntHandler() as the ISR, this calls  lwiplib.c's
lwIPServiceTimers() which calls lwIPHostTimerHandler()

 >>When running code from ETH interrupt
 >>handler, you have to *know* what you are doing! Basically this means:
 >>*no* access into lwIP from any other interrupt priority or main loop
 >>*unless* the ethernet interrupt is disabled.

Totally agree.  I've taken great care in my code to always do this and,
as far as I can tell, this is indeed what I am doing.  I never do any
calls into lwIP from anywhere except the Ethernet interrupt handler.

 >> However, intermittently, it appears to corrupt the
 >> lwIP's tcp.c's tcp_active_pcbs linked list.

 >And I would have written that as an example if you hadn't watched out
 >for what I wrote in the lines above ;-)

Right, you're saying the tcp_active_pcbs linked list being corrupted
like this is a common result of multiple execution contexts in lwIP
code.  I could understand that.

 >Try to clean up your code (in terms of execution threads) and if you
 >can, think about upgrading to a more recent version.

The code is quite clean, and, as an experienced developer, I've reviewed
it many times to verify the logic is correct and no thread issues
exist.  Other than this issue I'm having, the 1.41 lib is performing
very well.  I'm doing a lot of communication with a number of clients
over long periods of time without any issues.  Unfortunately, using
non-Tivaware 3rd party code is probably not an option for me at this
point.  I do understand 1.41 is quite an old version of lwIP so if the
answer boils down to "upgrade" being the advised solution I could
understand this.

Well, I'm sorry to say I cannot tell you what's the reason for the bug
you're seeing. I cannot recall a specific bug leading to this, but of
course it could be that it's a bug that has been fixed by now. However,
using lwIP like that (only in the ETH interrupt) is quite uncommon and
hard to review, so I wouldn't put concurrency problems aside as a reason


