[lwip-users] Re: LPC2468+lwIP+FreeRTOS+GCC

lwip-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-users] Re: LPC2468+lwIP+FreeRTOS+GCC

From:	pweb . ing
Subject:	[lwip-users] Re: LPC2468+lwIP+FreeRTOS+GCC
Date:	Fri, 9 Jan 2009 00:37:04 +0100

Chris,

i really didn't understand your interesting in my lwip port if you
have already decided that niche is better.
i shared my code with you thinking that you could provide other
feedback, debugging and ideas to lwip community. i'd like to know what
you will do with my code.... only used it for comparison with niche?

regards
piero


2009/1/8, Chris Strahm <address@hidden>:
> Hi Piero:
>
>>> ah ok... i used freertos 472... it could be possible that sys_arch.c
>>> needs
>>> some changes using a newer version...
>>> can you understand where there is a problem?
>
> I wanted to do that, but I never got the time.  When the orginal 5.0 file
> worked I just never got back to it.  I believe it is causing some major
> fault in FreeRTOS because when I run the code it goes straight into the
> weeds somewhere inside FreeRTOS.  I've attached the two files.  I'm sure you
> know this code far better than I do so you might be able to spot the reason
> quickly.  To me, the two files are completely different.  Yours is 2X the
> size of the other.  There are so many differences between the files I really
> had no idea which area might be the cause without spending a lot of time.
>
>>> - emac DMA  wants 16 bit alignment, and buffers configured
>>> by application fo RX could be not aligned on 32 bit, so,
>>> coping from rx, i can use only 16 bit copy
>
> I have the 32b RX copy running already.  That seemed to be very little
> problem.
>
>>> - emac DMA  wants 16 bit alignment, and buffers configured by application
>>>
>>> for TX could be aligned on 8 bit only, so i can use only 8 bit for copy
>
> I understand.  I noticed that the source buffer address fed to netconn_write
> never shows up in the emac routine, so I new lwIP was copying the data
> inside.  I assume it is using memcpy, which I think will do 8/16/32
> automatically as efficiently as it can.  So I figured if the internal lwIP
> buffer is (4) aligned, then it should be no problem using 32b copy to the
> DMA buffers.  You could still feed it (1) aligned http data at the front end
> since lwIP is copying the data anyway, it ends up (4) aligned when it gets
> to emac TX.
>
>>> anyway, the best optimization on driver, it is a ZERO COPY.
>
> I agree, however my thought was that if more copying is being done it is
> even more important that it be done efficiently.  Doing 8b copy means 4X as
> many calls through the loop with flag bit checking, calcs, etc besides the
> difference in memory access.
>
> I had a couple of the NXP FAEs in my office last summer and we were talking
> about the EMAC performance in the LPC2468 in some detail.  At the time my
> design was planned with a STR912 (ARM9).  The LPC2468 is much faster Enet
> than the STR912.  The CMX TCP/IP stack and the Niche TCP/IP stack both have
> zero copy and they do 99Mb/sec on the LPC2468.  The DMA engine with its own
> AHB bus in the LPC24XX emac is a speed demon.  It's hard for the CPU to copy
> the data in just to keep up.  They said you really need to use full 32b word
> size copy to get the data in fast enough.  With Full Duplex and 32b data
> flow the CPU is at 60% utilization keeping the DMA full.  With 8b copy it
> simply cannot do it.  If you have zero copy elsewhere, then you cannot get
> max speed without 32b copy to DMA.
>
> It appears that most everything being sent to the emac TX is (4) aligned.
> However there is one tiny block being sent that is only 58 bytes in size,
> and it is (2) aligned.  My guess is that some control stub or small data
> array was defined someplace in lwIP, and there was no alignment set on it.
> I'm going hunting for it.
>
> The other option is to add some additional smarts to the emac TX copy
> routine so it uses 8b/16b/32b copy on the fly based on the alignment of the
> pointers.  That's very easy to setup too.  If the rest was zero copy, then
> this might actually be the best option because it would give maximum
> performance regardless of the alignment of the data at the front end.
>
>>> i wanted to say you... send SHORT debug msgs on uart...
>>> this is the reason why i'm thinking of coding msgs.
>
> I had similar thoughts, but I planned on setting up a small FIFO buffer.
> That way lwIP can send strings as fast as it wants and the UART can dribble
> them out as long as it takes.  Probably put it in its own task as well.
> Should have negligible impact on the stack.
>
>>> We are discussing about it on lwip-dev mailing. Join in this discussion,
>>> and read old emails.
>
>
> I have the NicheLite stack code also, and 2 weeks ago I was working to get
> that setup.  Got about 75% done and then went back on this lwIP.  The Niche
> code is very well done, very high performance, very clean.  Asm for critical
> sections, just great code.  They actually use 2 different sizes for the
> packet buffers: 128 bytes and 1536 bytes.  They use a lot of the little
> buffers and a few of the big, then split the traffic between them based on
> size.  They say typical traffic profile falls into 2 catagories: small
> packets for Acks, ARP, DNS, ICMP, Broadcasts, etc. or big payloads in
> TCP/UDP/FTP etc.  Having both big/small packets uses memory far more
> efficiently and stops the tiny packets from wasting the big buffers.  Pretty
> smart.  They also have a great debugger built into the code that you can
> feed commands to and get running status and performance details on the fly.
> Ideal for tuning.
>
> My primary concern for my present app is high speed TCP performance.  The
> http requirement is minimal.  I am using private ports to send bulk megabyte
> data streams.  The more speed I can get the better.
>
> I may yet try to setup dual size buffers in this emac port as well.  It's
> not that difficult to do, and I think it is a very smart idea.  As for the
> zero copy, I thought about that but I really hate working on this lwIP code.
>  All of the macros and inline functions make it very difficult to trace and
> debug.  Compared to the Niche code, it is badly organized, poorly
> structured, missing header prototypes, etc. that make it very difficult to
> figure out what is being used and where its at.  Very tedious, time
> consuming, prone to errors, and frustrating to work on.  I am clearly not in
> the lwIP fan club.  I really wasn't looking for another science project when
> I started this!  :))
>
> Best Regards,  Chris Strahm
>
>

[Prev in Thread]

Current Thread

[Next in Thread]

[lwip-users] Re: LPC2468+lwIP+FreeRTOS+GCC, pweb . ing <=

Prev by Date: RE: [lwip-users] where does lwIP need some loving for future
Next by Date: [lwip-users] lwip_select() and UDP
Previous by thread: RE: [lwip-users] where does lwIP need some loving for future
Next by thread: [lwip-users] from where if i read source of lwip?
Index(es):
- Date
- Thread