2. i think that HERE we have to summarize ideas, and, as Simon
said, produce a template, a "skeleton" which should be used to start
zero copy driver with DMA implementation, providing some guide lines,
depending of various DMA limits, depending of HW; and, yes, some
ready-to-use routines should be provided
3. when discussion here will produce a stable results, move
changes to lwip stack, and add pages on wiki
Now i'm reading again my microcontroller user manual... i forgot
details about my emac DMA works...
after this i will try to discuss here if proposed solutions up
to now, are applicable on my HW,
and i will try to summarize in one email all the important
results.
Out Of Thread...
i think that the future step to improve lwip performance, after
these guide lines about zero copy driver, should be a review of BSD
socket layer.
I suppose that may lwip users uses sockets and an OS, and wants
good performances,
Bye
Piero
2009/1/7
address@hidden <address@hidden>
Jonathan Larmour wrote:
It certainly needs to be carefully considered. Different devices have
different properties:
- some must have all buffers (RX and TX) in a defined region of memory
- some must have RX in one region, TX in another
- some can have buffers in one region and use that preferentially for
speed, but can fall back to slower memory if needed
- some can tolerate the 'struct pbuf' header, some can't
- some can scatter gather, some can't
- Some allow entirely variably sized buffers.
- some have fixed size buffers, which may be fixed at the MTU, or may be
smaller; and may be hard-coded for that hardware (e.g. 128 bytes on
AT91SAM7X) or can be any size, but it has to be fixed for all buffers
when
the device is initialised.
- some have fixed size buffers and require them to be filled for all but
the final one in the chain
- some have fixed size buffers and don't mind if they are not all filled
- some have associated buffer descriptors which need fiddling with when
the
pbuf data becomes available again after having been freed.
- In any of the above, there can be alignment constraints above and
beyond
the CPU architectural alignment; most frequently DMA and/or cache line.
And the real annoyance: you could have two different devices with
different
(non-overlapping) sets of the above restrictions.
Nice list. I guess the hook method is quite a nice start for a generic
solution: I can't see how we can easily cover all of the above.
In addition to the hooks we could/should provide:
a) an example or template implementation of a relocatable pool to
allocate memory from a specific predefined region,
b) a flag if a pbuf is allocated for RX or for TX and
c) an example or ready-to-use routine to be used in low_level_output()
that can copy a pbuf to new pbufs if the originals cannot be used
The seldom cases where an RX pbuf is reused for TX can be solved by c).
Although this is not a complete solution for zero-copy, it would allow
many users for zero-copy DMA which currently have to change the lwIP
core code to get that.
Simon