lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] tcp_write with zero-copy


From: Jonathan Larmour
Subject: Re: [lwip-users] tcp_write with zero-copy
Date: Sun, 17 Feb 2008 01:03:17 +0000
User-agent: Mozilla Thunderbird 1.0.8-1.1.fc4 (X11/20060501)

Timmy Brolin wrote:
Hi,

Yes, the rx pool may have to be slightly bigger, but the tx pool could be set to almost zero instead.

Only in a limited subset of applications, I would have thought. Very few protocols have responses which you only slightly modify, and send back, keeping the same packet size; fewer still TCP-based ones (rather than UDP) - I can't think of any. After all, TCP is stream-based so you have no idea how many pieces your message will arrive in at the far end. Or if the protocol isn't entirely synchronous or multiple packets of this protocol can be sent at once, then there may be bits of subsequent packets within the same pbufs. It seems a little like you're trying to make a quite specific scenario more efficient based on guarantees that the underlying protocol does not make.

Determining the optimum balance between rx and tx pool sizes is not very easy as it is now. With true zero copy there would be no such balance. Simply put all available memory into the pbuf pool.

But then you run the risk of running out of configured space for receiving data, because it's all used up with data for transmission. RX data has to take priority, especially since it includes TCP ACKs.

Yes, the system may become more "memory efficient" in the sense that more of the available memory is used at any time; but this is at the expense of deterministic behaviour. It is more deterministic to have the general principle of having a set of pbufs that are reserved only for rx data.

Today the application have to allocate a buffer for tx data before it can free the rx buf, so momentarily there is twice the amount of memory used, and when the application sends the data, lwip will do a second tx buffer allocation and memcpy which means yet again there is momentarily double the memory use.

In practice, there may not be any particular problem with having a tcp_write_pbuf() variant - that's pretty much just moving existing code around a little so hopefully wouldn't have any real repercussions for normal users. But I wouldn't be happy about consolidating the pbuf memory into a single pool in general.

There are ways of avoiding this second allocation and memcpy by using tcp_sent, but it is not a very practical method since it requires the application to keep track of exactly which data has been sent and acked. I am afraid that I don't quite understand how using pbufs for both rx and tx would use more memory than the separate rx/tx pools uses today.

Consider a more general TCP stream then you are using for your protocol. There are few constraints on how much data can be enqueued, principally TCP_SNDBUF and TCP_SNDQUEUELEN. So an application that has a lot of data to send will be able to fill each tcp connection's send buffer entirely to those limits. That would be done at the expense of rx buffers in your scenario. That greatly risks deadlock.

So you might think then "well, why not just make sure TCP_SNDBUF and TCP_SNDQUEUELEN" are set to prevent that, in which case you may as well have used a separate tx buffer space, since you're again effectively dividing up buffer space.

Anyway, I think if you can make a tcp_write_pbuf() implementation that would not increase the footprint for those who don't use it, then feel free to submit it to the patches page on savannah. If it doesn't increase footprint, I'm sure that would be ok to accept (after 1.3.0). But it does seem a little to me like the protocol you are implementing really should be datagram-based, not stream-based.

Jifl
--
eCosCentric Limited      http://www.eCosCentric.com/     The eCos experts
Barnwell House, Barnwell Drive, Cambridge, UK.       Tel: +44 1223 245571
Registered in England and Wales: Reg No 4422071.
------["The best things in life aren't things."]------      Opinions==mine




reply via email to

[Prev in Thread] Current Thread [Next in Thread]