Hi,
after upgrading lwIP from 1.3.2 to lwIP 1.4.0-rc1 a problem started showing
up in our products, still present in -rc2. I bet it's there in 1.4.0
too, even without having testes it, as the relevant code has not changed.
I think this problem can be considered a bug, although I may be doing
something wrong, so I'm asking here.
The problem is caused by the introduction of current_iphdr_dest/_src
global variables, which made udp_input() and probably other functions
non-reentrant anymore.
The context in which the bug showed up is the following.
A NO_SYS==1 lwIP-based device has 1 Ethernet netif plus a a loopback netif
(LWIP_NETIF_LOOPBACK).
Two different logical entities (A and B) exist and run independently on the
device.
Entity A sends UDP packets to a target entity, which may be entity B, thus
the packets sent from A to B are queued into the loopif queue.
If the packet flow is sufficiently high, this happens frequently:
1. entity A sends a packet to entity B; the packet (pck1) is enqueued into the
loopif queue;
2. the main application loop calls netif_poll, which calls ip_input to handle
pck1;
3. ip_input (line ~314) sets current_iphdr_dest/_src with values from pck1
4. ip_input calls udp_input for pck1
5. before udp_input does any real stuff, an incoming Ethernet packet (pck2)
triggers an interrupt;
6. in IRQ context, the new packet reaches ip_input which overwrites
current_iphdr_dest/_src with values from pck2 (non-reentrancy);
7. when pck2 has been handled, the code exits the IRQ handling;
8. execution continues in udp_input, where it was about to handle pck1;
9. udp_input checks for current_iphdr_dest/_src, which have been overwritten
with a value from pck1; this is clearly wrong, and leads to (at least)
dropped packets.
Do you think there is something wrong I did, or is my analysis correct?
Can be this considered a bug? Should I then file a bug report?
I could not follow closely the discussion that led to introducing
current_iphdr_dest/_src and do not have a plan to solve this issue, but
at least I found a simple workaround: blocking interrupts before calling
netif_poll() makes the product work as before.
Thanks,
Luca
_______________________________________________
lwip-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-devel