On Sun, 2009-08-23 at 15:35 +0300, Eran Rundstein wrote:
> This flow goes on for a while, and then suddenly lwip_write does not
> return - looking at the thread's stack, it is indeed blocking on the
> op_completed semaphore. Nothing seems to signal it - what causes this
> is still a mystery to me.
Sounds like there are two problems here. The first is as described
above, and is probably the more serious. Can you get a packet capture
to illustrate this? Very little has changed since 1.3.1RC1 so I'd be
surprised if this fixed it, but it would be a good idea to check on
current CVS.
> At this stage, I forcefully close the connection from the Linux side.
> This will cause err_tcp() to be called from within the context of LWIP
> thread (tcpip_thread()). err_tcp() first attempts to post data to the
> connection's mbox with:
> if (conn->recvmbox != SYS_MBOX_NULL) {
> /* Register event with callback */
> API_EVENT(conn, NETCONN_EVT_RCVPLUS, 0);
> sys_mbox_post(conn->recvmbox, NULL);
> }
>
> And afterwards it may or may not signal the completion semaphore. Now,
> sys_mbox_post does not return until the message is posted to the
> queue. Assuming the queue is full, it will not return until a message
> is read from the queue and space is made available.
This second problem seems a bit more straightforward. We need some way
to postpone the addition to the recvmbox if the mbox is full, or change
the mbox so that NULL can always be posted. I prefer the latter.
Either way, could you file a bug for this part of the problem?