lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] Issue with "blocked" pcbs


From: Klaus Breining
Subject: Re: [lwip-users] Issue with "blocked" pcbs
Date: Wed, 30 Dec 2020 14:19:30 +0100
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0

Hi Indan,

thanks a lot, I have found the issue: As mbedtls needs a lot of memory and my CPU has only 256kByte, I have reduced the maximum  amount of tcp_pcbs to 4. If I really flood the system with requests (and usually I always try to torture a system to get it 100% stable), some packages are lost as no DMA-buffers are left. So some PCBs will hang in the FIN_WAIT_1 state. But not forever: after a very long timeout (default it is more than 2 hours on my system) they go back to CLOSE. If all 4 PCBs are in FIN_WAIT_1 of course, the system cannot respond anymore.

I have now reduced TCP_MAXRTX to 6, my timeout is around 5 minutes now. This ensures, that the system will recover in a somehow acceptable time. Maybe I will also increase my DMA-buffers.

As far as I can see, my error-handling in altcp_close() is correct. But I have found an issue in altcp_tcp_accept(): As it is called twice for the 2 linked altcp_pcbs and if the second fails, the first one is not cleaned up causing a memory leak. I have posted this on lwip-devel:

https://lists.nongnu.org/archive/html/lwip-devel/2020-12/msg00014.html

Regards
Klaus

Am 30.12.2020 um 12:29 schrieb Indan Zupancic:
Hello,

On 2020-12-29 20:07, Klaus Breining wrote:
I have seen the following sequence:
- Receive SYN
- Answer SYN/ACK
- Receive ACK
Up to now everything is normal. Here altcp_tcp_accept() is called.
PCB is ESTABLISHED
- Receive PSH/ACK
- Answer PSH/ACK
- Receive ACK
- Send FIN/PSH/ACK
- Receive RES/ACK
Reset processing ends up in tcp_ack_now(), the flag acceptable is not
set. The function tcp_process() returns ERR_OK without doing anything
with the PCB.

How this PCB will ever disappear? It is now in state FIN_WAIT_1 with
flags TF_RXCLOSED | TF_FIN. The PCB will never receive anything again
and the slow timer tcp_slowtmr() is the only function that could
change the state of a PCB - but it only resends the last FIN/PSH/ACK
package every few seconds to a port which obviously the browser on the
other side has already closed.

For TCP, closing one end of the connection does not imply that the
other end is closed too. If the other side doesn't also close the
connection then you end up in your situation.

Normally the application closes the socket when receiving a FIN
(tcp_recv_fn will be called with a NULL pbuf if the connection is
closed). An extra complication is that if you have data queued
for transmit, you probably want to finish sending that first
before actually closing the connection.

TLS has the added complication that it first needs to decode
and handle all encrypted data before it can pass the FIN to
the application layer, see altcp_mbedtls_lower_recv(). In this
corner case ALTCP_MBEDTLS_FLAGS_RX_CLOSE_QUEUED will be set
and later handled in altcp_mbedtls_pass_rx_data(), which is
called periodically by the polling function. As far as I can
tell the current altcp code is correct in this regard. Same
for http_close_or_abort_conn().

Does your modified version also has error handling for the
altcp_close() call? (Similar polling approach or calling
altcp_abort() on error.) If not, the call may fail when low
on memory and if your application code does nothing you will
stay in this state forever.

Greetings,

Indan

_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users





reply via email to

[Prev in Thread] Current Thread [Next in Thread]