I first came to lwIP with the same goals: wanting high performance
without having any constraints of code size and RAM, but the attraction
of lwIP was that it was clean and simple and could be easily customised
to my needs. The problem with trying to optimise performance is that
you can speed up one traffic pattern at the expense of another. The
Nagle algorithm that started this discussion is a good example: for bulk
transfers Nagle can help a lot, but in your case with ping-pong style
traffic it harms performance. I think therefore that by keeping the
code clean and simple we serve both goals well. Those who need to
optimise for code size and RAM can do so. Those that need to optimise
for performance can do so. I'd encourage everyone to share their
improvements, but not all will make it back into the core stack. I am
therefore happy to continue with the project's goals as they stand.