|
From: | address@hidden |
Subject: | Re: [lwip-users] TCP bandwidth limited by rate of ACKs |
Date: | Wed, 12 Oct 2011 21:01:13 +0200 |
User-agent: | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 |
Mason wrote:
Well, given a correctly DMA-enabled driver, you could avoid one task switch by checking RX packets from tcpip_thread instead of using another thread for RX (as suggest your "Task breakdown" by the name "RxTask"). You would then set a flag / post a static message from your ISR, process the packet in tcpip_thread (without having to copy it) and post the data to your application thread.Bill Auerbach wrote:That 7.4% for memcpy is a direct hit on throughput. You're seeing a breakdown of total CPU time. How much of that 7+% for memcpy comes out of the total time used by lwIP? I think you'll find that to be a much larger hit and a large contributor to lower bandwidth.Bill, IMHO, the elephant in the room is task-switching, as correctly pointed out by Kieran.
Also, by using the (still somewhat experimental) LWIP_TCPIP_CORE_LOCKING feature, you can also avoid the task switch from application task to tcpip_thread (by using a mutex to lock the core instead of passing a message).
I didn't mean to discourage you with my comments, I only meant it doesn't work out-of-the box with a current lwIP. However, I know it's not as easy for an lwip beginner to make the changes required for the RX side (the TX side should not be a problem via adapting the mem_malloc() functions).Assuming that every memcpy were lwip-related, and that I could get rid of them (which I don't see how, given Simon's comments) the transfer would take 478 instead of 516 seconds.
If I made the changes to support PBUF_REF for RX in git, would you be able to switch to that for testing?
I plan to implement zero-copy on an ARM-based board I have here, but I haven't found the time for that, lately :-(
Simon
[Prev in Thread] | Current Thread | [Next in Thread] |