[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lwip-users] tcp_write with zero-copy
From: |
Timmy Brolin |
Subject: |
Re: [lwip-users] tcp_write with zero-copy |
Date: |
Sun, 17 Feb 2008 05:19:19 +0100 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) |
I see your point. For applications which tansmit large amounts of data
it is better with separate buffers. This includes pretty much all bulk
data transfer protocls such as ftp/http or streaming media protocols.
Things are a bit different for control protocols.
Regarding TCP versus UDP for request-response protocols; It is common to
use TCP rather than UDP for such protocols simply because TCP guarantees
correct data transfer. A UDP version would need additional ack and
retransmission functionality. Why reinvent the wheel when TCP is already
there.
No assumptions can of course be made about the higher layer "packets"
beeing aligned to the lower layer IP packets, but in 99+% of all cases,
they are. Code which reallocates and aligns any odd unaligned data to
pbuf boundaries is of course necessary.
The kind of protocols I work with are industrial control protocols. The
data sizes per packet/request are typically very small. Requests and
responses usually fit well within a single 128Byte pbuf. All time
critical data is transmitted using UDP in a cyclic fashion and needs no
ack/retransmission. But things like configuration and diagnostics is
typically done using TCP.
I had a quick look at the tcp_write function, and it seems there could
be some problems involved. Things such as what if the available "space"
in the send buffer is smaller than a pbuf. Should the pbuf be splitted
and reallocated, or should the entire pbuf be added to the send queue
anyway.
For now I will probably look into implementing something using tcp_sent.
Regards,
Timmy Brolin
Jonathan Larmour wrote:
Timmy Brolin wrote:
Hi,
Yes, the rx pool may have to be slightly bigger, but the tx pool
could be set to almost zero instead.
Only in a limited subset of applications, I would have thought. Very
few protocols have responses which you only slightly modify, and send
back, keeping the same packet size; fewer still TCP-based ones (rather
than UDP) - I can't think of any. After all, TCP is stream-based so
you have no idea how many pieces your message will arrive in at the
far end. Or if the protocol isn't entirely synchronous or multiple
packets of this protocol can be sent at once, then there may be bits
of subsequent packets within the same pbufs. It seems a little like
you're trying to make a quite specific scenario more efficient based
on guarantees that the underlying protocol does not make.
Determining the optimum balance between rx and tx pool sizes is not
very easy as it is now. With true zero copy there would be no such
balance. Simply put all available memory into the pbuf pool.
But then you run the risk of running out of configured space for
receiving data, because it's all used up with data for transmission.
RX data has to take priority, especially since it includes TCP ACKs.
Yes, the system may become more "memory efficient" in the sense that
more of the available memory is used at any time; but this is at the
expense of deterministic behaviour. It is more deterministic to have
the general principle of having a set of pbufs that are reserved only
for rx data.
Today the application have to allocate a buffer for tx data before it
can free the rx buf, so momentarily there is twice the amount of
memory used, and when the application sends the data, lwip will do a
second tx buffer allocation and memcpy which means yet again there is
momentarily double the memory use.
In practice, there may not be any particular problem with having a
tcp_write_pbuf() variant - that's pretty much just moving existing
code around a little so hopefully wouldn't have any real repercussions
for normal users. But I wouldn't be happy about consolidating the pbuf
memory into a single pool in general.
There are ways of avoiding this second allocation and memcpy by using
tcp_sent, but it is not a very practical method since it requires the
application to keep track of exactly which data has been sent and acked.
I am afraid that I don't quite understand how using pbufs for both rx
and tx would use more memory than the separate rx/tx pools uses today.
Consider a more general TCP stream then you are using for your
protocol. There are few constraints on how much data can be enqueued,
principally TCP_SNDBUF and TCP_SNDQUEUELEN. So an application that has
a lot of data to send will be able to fill each tcp connection's send
buffer entirely to those limits. That would be done at the expense of
rx buffers in your scenario. That greatly risks deadlock.
So you might think then "well, why not just make sure TCP_SNDBUF and
TCP_SNDQUEUELEN" are set to prevent that, in which case you may as
well have used a separate tx buffer space, since you're again
effectively dividing up buffer space.
Anyway, I think if you can make a tcp_write_pbuf() implementation that
would not increase the footprint for those who don't use it, then feel
free to submit it to the patches page on savannah. If it doesn't
increase footprint, I'm sure that would be ok to accept (after
1.3.0). But it does seem a little to me like the protocol you are
implementing really should be datagram-based, not stream-based.
Jifl