We have been using lwip version 1.2 with some applied
patches for our embedded networks stack. Our two components ethernet devices
are directly connected and both are executing our stack. The MTU is as
expected on ethernet, 1500, however, our TCP MSS value is set to 1476.
My reading of RFC's and the like indicates that the value should be at
least as low as 1500(MTU)-20(IP Header)-20(TCP Header)=1460. I also understand
that TCP could provide additional options in the header which could make
the value smaller. At this point, we are not using any TCP options, so
we believe 1460 would suffice. I would definitely appreciate any feedback
on that assertion.
Now with that said and our misconfigured MSS value
of 1476, here is our current experience:
When we configure lwIP with a TCP_MSS of greater than
1460, and
connect to a peer with a similarly large MSS, and the "negotiated"
MSS
exceeds the MTU, we're seeing an error in the IP fragmentation code
applied to TCP packets. Specifically ip_frag() is dereferencing a
null pointer. We've tracked this down to the entry condition of
ip_frag() where the p->tot_len is exceeding the sum of the p->len
of
the p->next chain. In our case the p->tot_len is 3016, and
the sum of
the p->len is 1516. Since the loop is traversing the packet chain
until the tot_len is zero, this walks off the end of the chain.
What I'm unsure of is whether the IP fragmentation code can tolerate
this misconfiguration of the MTU, or whether there is likely an error
in the fragmentation code, or perhaps in one of our drivers.
Any thoughts on this are greatly appreciated. Additionally,
since we are executing in a real-time multi-threaded environment our investigation
is focusing on timing and race condition scenarios. Feel free to chime
in with observations and experience in those areas as well.