[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-devel] UDP - transmission inefficiences

From: Kieran Mansley
Subject: Re: [lwip-devel] UDP - transmission inefficiences
Date: Tue, 09 Jun 2009 16:18:36 +0100

On Mon, 2009-06-08 at 17:46 -0400, Bill Auerbach wrote:
> The first thing I notice using udp_sendto is that ip_route is called
> for *every* call to udp_sendto.  Shouldn’t the “if” be stored in the
> pcb, once the “if” is known, so that ip_route is called only once per
> pcb?  If this is not a good idea, in which cases will the interface
> change for a pcb?

There's a trade-off between size of pcb and CPU overhead, and as the
interface can change (whenever there's a change to the implicit routing
table) caching it in the PCB would also require that we have some
mechanism for invalidating this cache.   Changes should be rare, so this
might not be so hard, but I think the reason it's like it is is because
it was simpler that way.  

> Switching to the less portable udp_sendto_if made a noticeable
> improvement to sending bandwidth (over 10%).  The next thing I notice
> is that pbuf_header is called 3 times for each udp_sendto_if call.
> Shouldn’t a pbuf know its intended usage and the first call to
> pbuf_header reserve all the space required for headers with lower
> layers knowing that it’s been done?

Bit of a layer violation to do things that way, and the rest of the code
is a bit simpler - each layer knows it just has to look at the payload
pointer, and not add on bits that depend on sizes of headers for other
layers.  Again, it's a case that simple wins over performance in the
current implementation. 

>  I do not have a solution for this but I suspect a good bit of time is
> lost on the 2 extra calls to pbuf_header.

It would be interesting to quantify this.  I'd guess that it's not all
that significant, but if you can show otherwise it may be worth looking

> The inet_chksum call in ip_output_if is another noticeable detriment
> to speed (about 10%).  I changed this to checksum inline as the header
> is filled in to avoid the call to inet_chksum.

A patch for that would be good - I can't see any problem with including
that in the core distribution.

>   The netif->output call calls my driver which sets up a DMA and
> starts the transfer.
> There really isn’t much going on here, especially without UDP payload
> checksumming.  I’ve been at a loss as to why bandwidth was about a
> sixth of what is reasonably possible. Now it’s about a third of what
> should be possible – this is only 30% without checksumming!  I should
> get to 50-70% of reasonable with memory improvements but I need to do
> better.

I would start profiling code sections to see where the time is being
spent, but I'm not sure what tools are available to you to do this sort
of thing on your system.  I would guess that a lot is to do with the
overheads of supporting the higher layer APIs, and that with the raw API
you'd immediately get much closer to line rate.  


reply via email to

[Prev in Thread] Current Thread [Next in Thread]