[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-devel] [bug #50837] LWIP TCP/IP race condition

From: preet
Subject: [lwip-devel] [bug #50837] LWIP TCP/IP race condition
Date: Fri, 21 Apr 2017 14:13:20 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36

Follow-up Comment #9, bug #50837 (project lwip):


You are absolutely right, I did not look through the wireshark trace in detail
and hypothesized that the window was shrinking down to zero.  So the
correction here is that window size of the Win/Python client was shrinking,
until it stopped doing ACKs as you mentioned.

The mailbox errors are coming from:

err_t sys_mbox_trypost( sys_mbox_t *pxMailBox, void *pxMessageToPost )

I don't think it is a matter of the mbox implementation not handling a full
window's worth of data because the mailbox under question is a mailbox to
queue a pointer, not the entire window (it is probably the pbuf ptr being
queued?).  The recvmbox is likely full because the RTOS task is stuck inside
the send() function.  I also see that in this situation, the TCP/IP dropped
packet count starts to go up.  I will search for the pcb->refused_data next

We should not increase size the mailbox queue bigger because otherwise it
starts to eat up all of the LWIP buffer pools and locks up the entire LWIP
stack.  More specifically, if I increase _DEFAULT_TCP_RECVMBOX_SIZE_ then it
starts to chew on the memory pools and eventually pings and other sockets
become non-operational.  So this is the deadlock I was talking about when I
opened up the issue.  I disagree that we shouldn't resolve the situation by
asking the programmer to read the data, because LWIP is blocking the send()
operation.  Maybe the next line after send() could be to read it, but while a
blocking call to send() is made, the receive mailboxes should be serviced
somehow and not chew up the buffers as defined by _DEFAULT_TCP_RECVMBOX_SIZE_ 
 I don't think I am mis-using the LWIP or socket API by any means.  The RTOS
combined with LWIP should be able to guard itself from this.

Thanks again Joel for looking into the wireshark.  Like I mentioned, I put a
work-around by forcing blocking timeout of 5 seconds to EVERY socket, but we
should be able to gracefully handle this deadlock.  I do not know the LWIP
code in detail, but my gut feeling says that we should handle the RST more
gracefully, and we need to figure out which code is consuming those mailbox
slots from _sys_mbox_trypost_ function


Reply to this item at:


  Message sent via/by Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]