I think I've gotten closer to the problem. It seems
that the size of the segments that are queued up to be sent by
tcp_output are larger than the 'wnd'
size (which is pcb->snd_wnd
in this case), and thus they don't
get sent. I'm not sure how this could have happened.
Specifically, this while() loop never gets entered (line 969 in tcp_out.c)
in tcp_output()
(seg is non NULL)
while (seg != NULL &&
ntohl(seg->tcphdr->seqno) -
pcb->lastack + seg->len <= wnd) {
because seg->len is bigger than wnd. And
the problem accumulates as segments
don't get sent,
they just get added to
the unsent segment.
Eventually it just
hangs the outgoing TCP stream,
since it deadlocks with this
logic. I don't see a way out
of this. It
seems the segments
should not accumulate if
they are longer than the
wnd size.
I looked at the LWIP_STAT printout, and everything looked
good: nothing hitting any limits and no errors at all.
It appears to me that
this is a configuration
problem, and not a weird stack
overflow/memory corruption issue that I had
originally suspected.
Any
ideas on why the
segments are accumulating
instead of being sent? I'm sure I
can answer this question by studying the code
for a few days or weeks, but
I'm hoping someone smarter than
me has seen this before.
This code is based on the ST STM32F4xx port of LWIP. I'm using the head of the LWIP source for the "ppp-new"
branch, and FreeRTOS 7.2.
I suspected the ST code might be buggy and not
queuing the DMA
transfers correctly,
but because the higher level
code isn't even *trying* to send new segments, I suspect the ST code less.
-Mark
On 12/18/2012 4:56 PM, Mark Lakata
wrote:
Hi Yuantut,
Thanks for the
message. I have stack
checking on (FreeRTOS option 2), and I
also doubled the size of the HTTP
stack, and that did not help. I did a
static stack analysis,
and the original size of my stack should be good enough. Double
that is overkill. If the stack is getting overwritten, then
several bugs have to be happening simultaneously. In any event,
the code is running in a tight loop in a single function, so the
http stack should not be growing with time, so it doesn't make
sense that this happens after 1500 iterations. I could believe
other stacks are getting corrupted, but I also increased the sizes
of those stacks (ie the tcpip task task), with no change. My
first thought *was* stack overflow, but none of the debug checks I
have in place suggest that.
I've turning on all the debug messages to see if there is
something strange going on.(Unfortunately, if I turn on
'pbuf' debug messages, the stack stops working... I think my
115200 baud serial port can't keep up with the number of pbuf
debug messages, due to random ethernet traffic.)
I've got some more trace information. I know that the mbox_fetch
in the tcpip stack is happening, so the tcpip is getting the
message from lwip_write(). It looks like the problem is in
lwip_netconn_do_writemore(), because it seems that sometimes it
does not call the sys_sem_signal(&conn->op_completed);
function at the end (because write_finished is 0). I've put in
break points, once I get to the part of the file that causes it to
hang, and this function does not get called. If this does not get
called, then the calling thread will hang.
-Mark
On 12/18/2012 3:49 PM, yuantu Huang
wrote:
Hi Mark,
Can you increase the http server task stack size
and then have a try?
Yuantu
From: Mark
Lakata <address@hidden>
To: address@hidden
Sent:
Wednesday, 19 December 2012 9:32 AM
Subject:
[lwip-users] socket write hangs, in LwIP 1.4.?
(ppp-new branch but not ppp related)
Hi,
I'm having trouble with my http webserver, if I
write a lot of data to the outgoing socket. I'm
basically copying a file that is being POST'ed to the
http server back to the http client. Very predictably,
after 69K of data has been read and 79K of data has
been written (basically a copy of the input + HTML
dressing), the final call to sockets.c:lwip_write()
hangs.
I've traced the hang to the infinite timeout at
lwip_write()
lwip_send()
netconn_write_partly()
TCPIP_APIMSG()
tcpip.c:tcpip_apimsg()
sys_arch_sem_wait(&apimsg->msg.conn->op_completed,
0); <- here
I've disabled almost everything in my application,
except for the LWIP related code and some trivial code
that flashes the LEDs and is unrelated. When I dump
the state of the FreeRTOS stack, I can see everything
is happily chugging along ... except for my suspended
HTTP task. So it is not the case that the stack is
wedged, but it seems to have ignored or did not reply
to my API call.
state name-------- free-- stk_top-
where-------------------------------------
Rdy0 IDLE 452 20000E7C
tasks(1933:4)
*Rdy3 tcpip_thread 5844 20002E9C
sys_arch(192:2)
Dlyd Eth_if 3084 20003CAC
xQueueGenericReceive/ethernetif(331:5)
Dlyd LEDx 44 20006D44
vTaskDelay/led_task(34:13)
Dlyd MAIN 1164 20000BB4
vTaskDelay/housekeeping(57:9)
Dlyd Tmr Svc 916 200013C4
timers(404:6)
Susp HTTP 3148 200049E4
xQueueGenericReceive/sys_arch(313:3)
-----------------------------------------------------------------------------
The code is basically calling read(socket, ...),
buffering it into single lines, then echo'ing the
single lines to write(socket,...). If I echo
everything, it hangs pretty repeated in the exact same
line number of data it reads from the scoket, around
1500. If I don't echo everything, but just summarize
the output, then it doesn't hang.
I don't see any exception messages, no memory
errors, no nothing on my debug output. Here's what it
looks like-- you can see the last line before it hangs
is the call to lwip_send, without the confirmation
back from the tcpip thread.
... <snip several hundred calls to lwip_send
(about 3000)> ....
lwip_recvfrom(1, 2001d544, 1500z, 0x0, ..)
lwip_recvfrom: top while
sock->lastdata=2001798c
lwip_recvfrom: buflen=1260 len=1500z off=0
sock->lastoffset=480
lwip_recvfrom: deleting netbuf=2001798c
lwip_recvfrom: top while sock->lastdata=0
lwip_recvfrom: netconn_recv err=0,
netbuf=20017f98
lwip_recvfrom: buflen=1260 len=720z off=780
sock->lastoffset=0
lwip_recvfrom(1): addr=192.168.0.125
port=61108 len=1500
lwip_recvfrom: lastdata now netbuf=20017f98
:1060A000020080B2EEE740F00400FAE740F008009A
lwip_send(1, data="" size=44z,
flags=0x0)
lwip_send(1) err=0 written=44z
lwip_send(1, data="" size=5z,
flags=0x0)
lwip_send(1) err=0 written=5z
:1060B000F7E740F01000F4E72A290DD131680A1DF6
lwip_send(1, data="" size=44z,
flags=0x0)
lwip_send(1) err=0 written=44z
lwip_send(1, data="" size=5z,
flags=0x0)
lwip_send(1) err=0 written=5z
:1060C00032600968F962002904D54942F96240F05A
lwip_send(1, data="" size=44z,
flags=0x0)
lwip_send(1) err=0 written=44z
lwip_send(1, data="" size=5z,
flags=0x0)
lwip_send(1) err=0 written=5z
:1060D000040080B26D1C12E00021F96209E0F96A47
lwip_send(1, data="" size=44z,
flags=0x0)
lwip_send(1) err=0 written=44z
lwip_send(1, data="" size=5z,
flags=0x0)
lwip_send(1) err=0 written=5z
:1060E000494505D001EB810302EB43013039F962E8
lwip_send(1, data="" size=44z,
flags=0x0)
lwip_send(1) err=0 written=44z
lwip_send(1, data="" size=5z,
flags=0x0)
lwip_send(1) err=0 written=5z
:1060F0006D1C2A78A2F13001C9B20A29EFD32978A0
lwip_send(1, data="" size=44z,
flags=0x0)
lwip_send(1) err=0 written=44z
lwip_send(1, data="" size=5z,
flags=0x0)
lwip_send(1) err=0 written=5z
:106100002E2903D04FF0FF31B9621DE015F8011FB1
lwip_send(1, data="" size=44z,
flags=0x0)
_______________________________________________
lwip-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-users
_______________________________________________
lwip-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-users
|