I'm sure you will get lots of answers to this. It's a really
interesting question, so lot's of folks will want to put their two
I feel that your assessment of feasibility is sound and that your list
of problems and their resolution is reasonably complete. Something
always shows up in implementation, and I'm sure that your project will
be no exception, but I do think that your design is solid.
As for the performance improvement, that's a very significant
question. First, I think that it is important to ask what kind of
performance improvement you seek. If you are just seeking to offload
the host, so that it can go on to do some other task faster, then you
stand a reasonable chance of seeing that happen. If you are ultimately
seeking to increase TCP/IP throughput, that will be a more difficult
Let me start with the affect upon the host. It would be reasonable to
say that you would be saving the time spent on these tasks by
offloading them. But will this be a significant savings? That will
depend on the packet rate of your transfers. If you are trying to
transfer a few hundred packets per second, you will find that your
stack overheads aren't really a big fraction of what the host is
doing. Only when your packet rate is so high that you find yourself
spending a significant fraction of host time in the stack can you hope
to increase the host performance on other tasks by offloading.
Increasing the throughput of the TCP/IP socket can be an even more
First and foremost, it would require that the TOE adapter
have a pretty powerful processor. Since your company produces such
things, this may not be a problem, but it is something to consider.
You couldn't speed up a 400MHz MIPS host by adding a 50MHz NiosII based
TOE, for instance.
Concentrating processing power in a single CPU can often prove to be a
more economical and flexible way to increase overall system
performance. If the cost of the TOE were applied instead to the CPU
power of the host, the performance win might apply both to the stack
performance and to other algorithms on the host. While I don't want to
be too discouraging, I wouldn't really want to pursue an architecture
like this unless the host was already performance bound on a very
Your host may have a stack that can take advantage of some of the
TCP/IP acceleration features of modern MAC/PHY interfaces. For
instance, many newer chips can offload most of the work of TCP/IP
checksum computations. Such features in the chips can only be used if
the stack comprehends that they may exist. Right now, lwIP isn't
really aware of such potential benefits in its network interfaces. You
might find yourself needing to do some work on the lwIP stack to
exploit features like this in your MAC/PHY interfaces.
Your host may have a stack that is in general more capable than lwIP.
You may find that there are features missing in lwIP that you really
need. lwIP is, as its name implies, a lightweight stack. It's
excellent for embedded appliances, but it does lack some features.
You may be very sensitive to the implementation of some algorithms,
most notably the slow start for TCP streams. The good news is that
lwIP is a simple enough implementation that you can tinker with things
like this if you feel that you must. The bad news is that you may find
that you have to do some tinkering to meet your speed goals.
The transcriptions between host and TOE may end up giving you a net
increase in packet latencies. For lots of protocols, latencies can
translate into throughput affects. Protocols that really stream like
FTP don't have this problem, but any protocol that is more heavily
acknowledged will. It would be best to analyze the net effect of the
TOE on both packet throughput and latency. Only if both of these were
a win would I be confident that all higher level applications would
actually benefit from the increased performance.
Curt McDowell wrote:
I'm looking into using lwIP as the basis for a TOE (TCP/IP offload
engine). If I understand correctly, the lwIP environment is
implemented as one thread for the IP stack, and one thread for each
APPLICATION THREAD IP STACK
Sockets <-> API-mux <------------> API-demux <->
Stack <-> netif
This architecture appears to lend itself fairly well to the following
TOE implementation (actually, SOE, as it would be a full sockets
PROCESSOR TOE ADAPTER W/ EMBEDDED CPU
+-------------+ +--------------+ +-------+ +----------+
| App using |---| lwIP library |------------| lwIP |---| Network
| sockets API | | Sockets API | Hardware | stack | | hardware |
+-------------+ +--------------+ bus +-------+ +----------+
- Does this assessment sound correct?
- Could a significant performance improvement be realized, compared
to using a host-native IP stack?
- Is anyone else interested in this type of application?
The only problems that I see are with the mbox
transport mechanism, in that it assumes a shared address space.
- It would need to send the data, instead of
pointers to the data.
- It would need to send messages for event notifications instead of
- Message reception on either side of the hardware bus would
be signaled through interrupts.
lwip-users mailing list
|Gibbons and Associates, Inc.
|TEL: (408) 984-1441
|900 Lafayette, Suite 704, Santa Clara, CA
|FAX: (408) 247-6395