|Subject:||[lwip-devel] UDP - transmission inefficiences|
|Date:||Mon, 8 Jun 2009 17:46:14 -0400|
In trying to isolate which components are responsible for the significant lack of TCP transmission speeds, I decided it’s easier to do UDP burst tests. This would allow me to quantify the Ethernet driver, processor and memory speeds of my system. Since disappointing transmission speeds have come up numerous times from others, I think my questions and experience will definitely help others.
UDP checksum is off because I know that’s a big part of the delay, and since everyone can turn that off, it levels the field for anyone wanting to test UDP speeds and also to see what impact checksumming adds to their platform. I see that 20% of the slowdown can be attributed to my hardware, mostly in the way of memory bandwidth. We’re working on that. The other 80% is spent in lwip.
The first thing I notice using udp_sendto is that ip_route is called for *every* call to udp_sendto. Shouldn’t the “if” be stored in the pcb, once the “if” is known, so that ip_route is called only once per pcb? If this is not a good idea, in which cases will the interface change for a pcb?
Switching to the less portable udp_sendto_if made a noticeable improvement to sending bandwidth (over 10%). The next thing I notice is that pbuf_header is called 3 times for each udp_sendto_if call. Shouldn’t a pbuf know its intended usage and the first call to pbuf_header reserve all the space required for headers with lower layers knowing that it’s been done? I do not have a solution for this but I suspect a good bit of time is lost on the 2 extra calls to pbuf_header.
The inet_chksum call in ip_output_if is another noticeable detriment to speed (about 10%). I changed this to checksum inline as the header is filled in to avoid the call to inet_chksum. The netif->output call calls my driver which sets up a DMA and starts the transfer.
There really isn’t much going on here, especially without UDP payload checksumming. I’ve been at a loss as to why bandwidth was about a sixth of what is reasonably possible. Now it’s about a third of what should be possible – this is only 30% without checksumming! I should get to 50-70% of reasonable with memory improvements but I need to do better. I know TCP is about half the speed of UDP so until UDP is improved, TCP isn’t an option for us.
My tests are on a NIOS II at 100 Mhz, 133MHz 32-bit SDRAM, 1Gig Ethernet with DMA to/from the MAC, and NO_SYS=1 – although for UDP NO_SYS doesn’t matter I believe.
Thanks for reading,
|[Prev in Thread]||Current Thread||[Next in Thread]|