lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] byte alignment


From: address@hidden
Subject: Re: [lwip-users] byte alignment
Date: Thu, 06 May 2010 06:41:20 +0200
User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; de; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4

Tyrel Newton wrote:

However, if the processor does the final copy (without a DMA enginge), than it's a bad thing if the data is not aligned. But you should be able to include a DMA engine in your FPGA, so...
Xilinx provides a gigabit mac with a built-in DMA (at an additional cost of course), so I definitely have options. I could also definitely write my own DMA, or for that matter, my own non-DMA Ethernet mac that simply accepts and discards a two-byte pad. But all of that is outside the scope (and priority) of my current effort. At the moment, I'm not terribly concerned about Ethernet performance as long as it works and isn't horrendously slow. My investigations into this issue came from re-writing the horrible lwIP driver provided by Xilinx. By re-writing the code in a reasonably intelligent manner, I managed to increase the throughput 4x along with making the system more stable. C-code is easier to change than VHDL . . .
I meant just include a standard RAM-to-RAM DMA controller (at least Altera provides something like that for free) and let it copy from your real RAM to the MAC's transmit-buffer RAM. For me, that was only a matter of 1 hour adding the DMA controller and recompiling the FPGA, the code to use it is quite simple and a lot faster than a processor-memcpy.
Single PBUF_RAM pbufs or chained pbufs?
Single PBUF_RAM pbufs. Looking through the TCP code, if the data is being copied into the stack (i.e. via NETCONN_COPY), I'm not even sure how chained pbufs would be created.
Not for the netconn-layer, no. .
I wouldn't say the system I'm using (at the moment at least) is zero-copy because once I receive the frame from lwIP, I pbuf_ref it, queue it up for transmit, and then eventually copy its payload to the mac's transmit buffer, after which I do a pbuf_free. Although I guess this is still zero-copy from the stack's frame of reference . . . its probably worth distinguishing somewhere between zero-copy macs and zero-copy drivers.
It is zero-copy but without delayed-transmit, and therefore, it's a bit out of the scope of that task. However, it would be non-zero-copy and in the scope of that task if you were to first copy the data to an aligned buffer and then copy that buffer to the MAC.

Simon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]