lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] TCP send() fails when other sockets perform retransmiss


From: Daniel Pauli
Subject: Re: [lwip-users] TCP send() fails when other sockets perform retransmissions
Date: Fri, 30 Dec 2016 19:59:41 +0100

Many thanks for your advice! I'll try to check the memory pools when I reproduce the issue next time.

Further, since it sounds like you initially had sockets configured in blocking mode, when the new socket tries to transmit, it will block trying to allocate TCP segments due to the exhausted memory pool.  The blocking will continue until SO_SNDTIMEOUT is reached or the memory exhaustion is resolved

To clarify, I ran two tests: In the first, all sockets used the MSG_DONTWAIT flag for send() (non-blocking), in the second no socket used the flag (blocking), so there should be no mixing of blocking/non-blocking from my point of view. I'm not sure if I understand what you mean with "initially configured in blocking mode". Does this mean that send() may still block under certain circumstances (exhausted memory pool) even with MSG_DONTWAIT flag set, so I should initially set the O_NONBLOCK option on the socket to ensure that send() never blocks?

Daniel

On Fri, Dec 30, 2016 at 6:30 PM, Joel Cunningham <address@hidden> wrote:

On Dec 30, 2016, at 10:43 AM, Daniel Pauli <address@hidden> wrote:

I'm a little confused about the use of select in your application.  Are you using it with blocking sockets?

I tested with both blocking and non-blocking send. I observed that non-blocking send (MSG_DONTWAIT flag set) on sockets determined as write-ready by select() sometimes returned ENOMEM when "stale sockets" are around. After applying the patch from http://lwip.100.n7.nabble.com/bug-49684-lwip-netconn-do-writemore-non-blocking-ERR-MEM-treated-as-failure-td27860.html, I got EWOULDBLOCK errors instead.


Thanks for including this information. The ENOMEM gives me a good clue of what’s most likely going on.  My guess is that you’re experiencing a memory pool exhaustion and the stale socket has claimed memory from a pool for the segments which are queued for transmit.  Since those segments are not being ACKed in the half open state, the claimed memory won’t be available until the segments are freed (happens during transmission timeout or when socket is aborted)

Further, since it sounds like you initially had sockets configured in blocking mode, when the new socket tries to transmit, it will block trying to allocate TCP segments due to the exhausted memory pool.  The blocking will continue until SO_SNDTIMEOUT is reached or the memory exhaustion is resolved

If you have LwIP stats enabled, you can check the memory pools for errors to figure out which one is failing.  You should be able to resolve this by sizing your memory pools to handle the number of supported connections.  For example if you only support 5 simultaneous TCP connections, then your pools should be big enough to allocate 5 send buffers worth of segments.  This is how I configure my products, which typically have plenty of RAM.  Not sure what the recommendation is for very constrained RAM products.


Calling close() will initiate a graceful synchronized closure of the connection.  This means continuing to send any queued data until it is ACKed, the send times out, or we received a RST.  Then a FIN is sent indicating the sending pathway is closed.

So there's no direct way for the application to tell LWIP to just give up on one socket without further trying to send data? Can the application specify a send timeout?\

Yes there is, with SO_LINGER you can perform an abortive closure rather than graceful by setting the timeout to 0.  Typically this is a bad idea.  There’s a decent discussion here on stackoverflow:

http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required


Lastly, what version of LwIP are you using?

I'm using 2.0.0 RC1

Joel

On Wed, Dec 28, 2016 at 4:23 PM, Joel Cunningham <address@hidden> wrote:


On Dec 28, 2016, at 06:45 AM, Daniel Pauli <address@hidden> wrote:

Am I understanding the description correctly that sending on the stale connection eventually blocks once the remote side has crashed and this prevents sending on the new socket (only because the thread is blocked)?

If so, then the socket buffer on the stale socket has filled up (most likely) and is now blocking.  This is blocking I/O operating as expected when data is not being acknowledged.  You should use non-blocking sockets and select if your server is servicing multiple sockets on a single thread.

Joel

Attempting to send on the stale socket blocks, which is okay on its own. But I'm already using select() and observed that
 
these stale sockets still somehow seem to block communication over new sockets,

If this is actually happening as described, that would be unexpected/faulty behavior.  One TCP socket in the half-open state should not have any effect on the other TCP connections.
 
even when no stale sockets are included in the write set of select().

I'm a little confused about the use of select in your application.  Are you using it with blocking sockets?  Select returning write-ability doesn't guarantee the send call won't block.  If you have a blocking socket and the size in the send call can't fit in the amount of available buffer space, the call will block
 
I even close() (successfully, according to the return value) those stale sockets after they failed to be write-ready after 10 seconds, but I can see in Wireshark that LWIP still sends retransmissions from the port number of the closed socket. 

Could it be that close() cannot send FIN because the output buffer is full, so the socket still remains active? Is there a way from the API to just drop the connection without involving any more communication?

Calling close() will initiate a graceful synchronized closure of the connection.  This means continuing to send any queued data until it is ACKed, the send times out, or we received a RST.  Then a FIN is sent indicating the sending pathway is closed.

Lastly, what version of LwIP are you using?

Joel

_______________________________________________
lwip-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-users


_______________________________________________
lwip-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-users


reply via email to

[Prev in Thread] Current Thread [Next in Thread]