[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lwip-devel] What is the correct behavior for missing the reception
Re: [lwip-devel] What is the correct behavior for missing the reception of an ACK
Fri, 21 May 2021 21:18:24 +0200
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1
Am 21.05.2021 um 20:26 schrieb Bill Auerbach:
> Hi everyone,
> It’s been a while for me on here but I’m still actively using lwIP – 20
> years now. Very heavily since 2007 now on 4 platforms with a several
> thousand user installed base. It’s all worked perfectly for us from
> lwIP 1.4 to 2.1.3. Thanks to you all for your continued effort on an
> excellent code base for an important component of our and many businesses.
Cool to hear you're still working with lwIP! You might be one of the few
left here from even before my start with lwIP :-)
> All is perfect except for something I’ve been debugging now for 3-4
> weeks. I have a problem when lwIP doesn’t receive an ACK. The way I
> see this is peculiar because there is no data sent to lwip, it is simply
> sending data once per second to the PC. The sequence number from the PC
> is always 1. When the expected incoming ACK is missed (because of
> something on my end I know after all this time), the data stream
> stalls. The next 1S packet after the missed ACK is held back until the
> RTO elapses. The data does continue after this delay with no loss of
> data. I check on the PC for a constant increase of 1 data in the data.
> The PC normally calls it an error at 2S to not have received its once
> per second message.
That sounds a bit strange: segments should not be held back here: TCP
has the sliding window of segments in flight. Segments should only be
held back if there's no space in the send window. However, sending 1
segment per second and aborting if there is no update in 2 seconds seems
a bit hard: if you're losing 2 ACKs like you lost the first, you're lost.
Anyway, can you provide a pcap trace of such a situation? It would be
best if it included the connection setup phase as well (but not
necessarily if it's too far apart).
> I disable this error check for debugging. In the field they see
> disconnects from every 4 hours to as long as 49, but not on all PCs and
> NEVER on Windows 7, only Windows 10 updated from about 3-6 months ago.
> This is why I got into this – field reported disconnects on a product
> shipping after 10 years. Why Win10, why not all Win10 and not Win7, I
> have no idea. Just this week I came to see I missed receiving an ACK –
> it’s in wireshark – the lwIP firmware just didn’t process it. Turning
> on lwIP debug was key.
Having pcaps of win7 and win10 to see what's different would be
interesting as well, but I don't think that's really the problem here?
> *With this missed ACK, should the data following that time frame still
> be sent on time?* If this lag is normal behavior, fine. But if you
> think the data should have continued one packet per second in spite of
> the missed ACK, then we have a problem (under I admit an error
> condition). I have serial output with TCP_DEBUG and RTO_DEBUG to show
> what transpired for this missed ACK if anyone wants to pursue this. If
> the lag is normal and expected behavior, Ok, thank you for reading and
> I’ll try to find why I missed a packet and continue. I can move to a 10
> second wait on the PC for a workaround to stop the disconnect errors if
> I cannot resolve the intermittent missed packet.
I'd probably move to more than 2 seconds anyway after we found the
reason for this, but (depending on your lwIP configuration), it should
probably still work, so I'd debug this first.
What are your TCP-specific settings? Do you have any transmit queue or
transmit window limitations that would make the stack wait for this
single ACK before allowing to enqueue or send more data?
> What made this worse than being intermittent is that I can run hours of
> burst ping tests, 1000 per second with sequence checking that they were
> sent and returned in order with no misses and this passes “forever”.
> It’s also odd because this is only at 100Mbs and is just sending once
> per second. But it fails. Fortunately it fails on my PC or I’d be in
> it for even a longer ride than it’s been.
> Best regards and thank you all for reading this,
> Bill Auerbach
> lwip-devel mailing list