lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] tcp_write with zero-copy


From: Timmy Brolin
Subject: Re: [lwip-users] tcp_write with zero-copy
Date: Sun, 17 Feb 2008 05:19:19 +0100
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)

I see your point. For applications which tansmit large amounts of data it is better with separate buffers. This includes pretty much all bulk data transfer protocls such as ftp/http or streaming media protocols.
Things are a bit different for control protocols.

Regarding TCP versus UDP for request-response protocols; It is common to use TCP rather than UDP for such protocols simply because TCP guarantees correct data transfer. A UDP version would need additional ack and retransmission functionality. Why reinvent the wheel when TCP is already there. No assumptions can of course be made about the higher layer "packets" beeing aligned to the lower layer IP packets, but in 99+% of all cases, they are. Code which reallocates and aligns any odd unaligned data to pbuf boundaries is of course necessary.

The kind of protocols I work with are industrial control protocols. The data sizes per packet/request are typically very small. Requests and responses usually fit well within a single 128Byte pbuf. All time critical data is transmitted using UDP in a cyclic fashion and needs no ack/retransmission. But things like configuration and diagnostics is typically done using TCP.

I had a quick look at the tcp_write function, and it seems there could be some problems involved. Things such as what if the available "space" in the send buffer is smaller than a pbuf. Should the pbuf be splitted and reallocated, or should the entire pbuf be added to the send queue anyway.
For now I will probably look into implementing something using tcp_sent.

Regards,
Timmy Brolin

Jonathan Larmour wrote:

Timmy Brolin wrote:

Hi,

Yes, the rx pool may have to be slightly bigger, but the tx pool could be set to almost zero instead.


Only in a limited subset of applications, I would have thought. Very few protocols have responses which you only slightly modify, and send back, keeping the same packet size; fewer still TCP-based ones (rather than UDP) - I can't think of any. After all, TCP is stream-based so you have no idea how many pieces your message will arrive in at the far end. Or if the protocol isn't entirely synchronous or multiple packets of this protocol can be sent at once, then there may be bits of subsequent packets within the same pbufs. It seems a little like you're trying to make a quite specific scenario more efficient based on guarantees that the underlying protocol does not make.

Determining the optimum balance between rx and tx pool sizes is not very easy as it is now. With true zero copy there would be no such balance. Simply put all available memory into the pbuf pool.


But then you run the risk of running out of configured space for receiving data, because it's all used up with data for transmission. RX data has to take priority, especially since it includes TCP ACKs.

Yes, the system may become more "memory efficient" in the sense that more of the available memory is used at any time; but this is at the expense of deterministic behaviour. It is more deterministic to have the general principle of having a set of pbufs that are reserved only for rx data.

Today the application have to allocate a buffer for tx data before it can free the rx buf, so momentarily there is twice the amount of memory used, and when the application sends the data, lwip will do a second tx buffer allocation and memcpy which means yet again there is momentarily double the memory use.


In practice, there may not be any particular problem with having a tcp_write_pbuf() variant - that's pretty much just moving existing code around a little so hopefully wouldn't have any real repercussions for normal users. But I wouldn't be happy about consolidating the pbuf memory into a single pool in general.

There are ways of avoiding this second allocation and memcpy by using tcp_sent, but it is not a very practical method since it requires the application to keep track of exactly which data has been sent and acked. I am afraid that I don't quite understand how using pbufs for both rx and tx would use more memory than the separate rx/tx pools uses today.


Consider a more general TCP stream then you are using for your protocol. There are few constraints on how much data can be enqueued, principally TCP_SNDBUF and TCP_SNDQUEUELEN. So an application that has a lot of data to send will be able to fill each tcp connection's send buffer entirely to those limits. That would be done at the expense of rx buffers in your scenario. That greatly risks deadlock.

So you might think then "well, why not just make sure TCP_SNDBUF and TCP_SNDQUEUELEN" are set to prevent that, in which case you may as well have used a separate tx buffer space, since you're again effectively dividing up buffer space.

Anyway, I think if you can make a tcp_write_pbuf() implementation that would not increase the footprint for those who don't use it, then feel free to submit it to the patches page on savannah. If it doesn't increase footprint, I'm sure that would be ok to accept (after 1.3.0). But it does seem a little to me like the protocol you are implementing really should be datagram-based, not stream-based.

Jifl





reply via email to

[Prev in Thread] Current Thread [Next in Thread]