lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] Out of memory in PCP_PCB pool after 2^32 milliseconds


From: Trampas Stern
Subject: Re: [lwip-users] Out of memory in PCP_PCB pool after 2^32 milliseconds
Date: Fri, 28 May 2021 16:05:31 -0400

So a trick I use in my code and libraries is to use typedef's for variables.  

typedef uint32_t milliseconds_t; 
milliseconds_t getMillis();  

Then I use milliseconds_t to define all variables.  This allows me to change it to uint64_t in one location depending on the project. 

I have started using more typedef's like this as a form of documentation.   That is code is easier to read and follow when variables are defined based on the use/type. 

A neat fixed point unsigned math trick is when doing comparisons... 

milliseconds_t start=   getMillis();

// This is bad 
while( getMillis()<(start +10) ){  //wait for 10ms 
.... 
}

To understand why assume milliseconds_t is uint8_t.  Now we get start and say it is 255,  this means (start+10) = 9, now getMillis() on the first loop is still 255... So the comparison becomes while (255<9).  So you exit while loop early

A better way to do this is 
milliseconds_t start=   getMillis();

// This is good
while( (getMillis()-start)<10 ){  //wait for 10ms 
.... 
}

Here you if start and getMills() are 255 the first loop is while(0<10).  Now next millisecond we have (getMillis()-start)  = (0-255) =1  to understand this look at the math as in binary:
 0000 0000
-1111 1111
= 1 0000 0001 where the first 1 is the negative bit, but since we are 8 bit unsigned the value is 1.  This means when doing unsigned subtraction you end up with a modulo absolute difference.  

Now with that said the code works but other developers might not understand it, and you risk them adding code or modifying that breaks things.  Therefore often I just use uint64_t just to make sure other developers do not break the code.  If speed becomes an issue I can optimize the code to use the fixed point math tricks, but only as a last resort.   

Note I know many developers that refuse to use unsigned variables due to math issues like above.  So they try to use signed integers for most everything.  You still have overflow issues but you do not have math issues. 

Here is a blog article I wrote on embedded systems and time: 
https://bitvolatile.com/?p=303

Trampas





On Fri, May 28, 2021 at 3:25 PM Adam Baron <vysocan76@gmail.com> wrote:
Hello Trampas,

thanks for the hints. I initialized the sys ticks with 2^32 - 120 seconds, and I got mqtt pbuf=NULL in around 120 seconds + 120 keep alive seconds.

The ChibiOs sys_arch.c port includes sys_now() (current time in milliseconds) following simplified implementation:
  return ((u32_t)chVTGetSystemTimeX() - 1) / 10 + 1;
Since it ticks at 100 uS.

I guess it might cause the problems as it overflows back to 0 leaving the lwip timers waiting for value higher than (2^32)/10.

To support my guess, I turned on another debug option and last lwip timer message I see is:
sys_timeout: 2000C5DC abs_time=429497730 handler=ip_reass_tmr arg=805B28C


Adam

pá 28. 5. 2021 v 13:45 odesílatel Trampas Stern <trampas@gmail.com> napsal:
Increase the counter to a uint64_t. 

You can also start the counter at something other than zero to prove root cause faster.

Trampas

On Fri, May 28, 2021 at 7:08 AM Adam Baron <vysocan76@gmail.com> wrote:
Czesc Tomek :),

I'll try to add it. Thanks.

However, I feel like it is rather related to the problem of overflowing a uint32 counter of some kind. Since the TCP_PCBs are not freed after 2^32 ticks.

Adam

pá 28. 5. 2021 v 9:44 odesílatel Tomasz W <wilkxt@gmail.com> napsal:
Hi (Cześć)
Lok for this https://lists.nongnu.org/archive/html/lwip-devel/2020-12/msg00014.html
In my case it solved the problem of the web server dying after a few days


pt., 28 maj 2021 o 08:58 Adam Baron <vysocan76@gmail.com> napisał(a):
>
> Hello all,
>
> I'm having a small STM32F4 application running on devel branch of lwip, It includes httpd, sntp, smtp client, and mqtt client. All is running well until the fifth day, when mqtt client starts to receive pbuf=NULL and disconnects. My reconnect routine reconnects it in some short time, but it receives pbuf=NULL shortly after.
>
> Also later on I noticed in log: memp_malloc: out of memory in pool TCP_PCB.
> I'm having defined MEMP_NUM_TCP_PCB as 30 and it seems enough for normal operation, I also upped it to 50, but ended with the same problem
> In statistics the NUM_TCP_PCB increases and decreases as it should, but after uptime past 5 days it stays high with an error flag triggered.
>
> Quite interestingly it happens exactly after 2^32 milliseconds uptime. I tried to keep OpenOCD connected to start to peek in, but yet I did not manage to keep the openOCD running for so long without dropping the connection.
>
> Does anyone have any ideas please?
>
> Thanks in advance,
> --
> 731435556
> Adam Baron
> _______________________________________________
> lwip-users mailing list
> lwip-users@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/lwip-users



--
Pozdrawiam
Tomek

_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users


--
731435556
Adam Baron
_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users
_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users


--
731435556
Adam Baron
_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

reply via email to

[Prev in Thread] Current Thread [Next in Thread]