|
From: | Ben Hastings |
Subject: | [lwip-devel] Out-of-order segments in half-closed connections |
Date: | Tue, 21 Apr 2009 16:49:51 -0400 |
I’ve found what I think
is an issue with the TCP state machine, but I was hoping to get some feedback
before declaring it a bug with lwip. The fix appears to make lwip handle
out-of-order segments in half-closed connections correctly, but I may very well
be overlooking some other scenario that this would break. After an active-close, the
receive callback is usually called with pcb->state == TIME_WAIT after
receiving the remote FIN. Because the other end of a tcp connection can
still send data after tcp_close is called, I am not freeing the tcp_arg data or
setting the receive callback to NULL until receiving the FIN. The problem
I’m having happens under heavy network traffic when I have closed the
connection and the segment before the remote FIN is lost. In this case
the receive callback is never called because of the missing data, but the
connection closes anyway. So, it looks like lwip transitions
to TIME_WAIT upon receiving a FIN from the other side, regardless of whether the
FIN segment is in order. At this point, any “lost” data that
is then resent by the remote side get’s ACK’ed but never delivered
to the application. The issue appears to be
resolved by copying the same transition criteria for the ESTABLISHED state to
FIN_WAIT_1 and FIN_WAIT_2. A patch for the 1.3.0-STABLE version of
tcp_in.c is below. @@ -640,8 +640,8 @@ }
break; case FIN_WAIT_1: -
tcp_receive(pcb); - if (flags
& TCP_FIN) { +
accepted_inseq = tcp_receive(pcb); + if
((flags & TCP_FIN) && accepted_inseq) {
if (flags & TCP_ACK && ackno == pcb->snd_nxt) {
LWIP_DEBUGF(TCP_DEBUG,
("TCP connection closed %"U16_F" ->
%"U16_F".\n", inseg.tcphdr->src, inseg.tcphdr->dest)); @@ -659,8 +659,8 @@ }
break; case FIN_WAIT_2: -
tcp_receive(pcb); - if (flags
& TCP_FIN) { +
accepted_inseq = tcp_receive(pcb); + if
((flags & TCP_FIN) && accepted_inseq) {
LWIP_DEBUGF(TCP_DEBUG, ("TCP connection closed %"U16_F" ->
%"U16_F".\n", inseg.tcphdr->src, inseg.tcphdr->dest));
tcp_ack_now(pcb);
tcp_pcb_purge(pcb); Here’s a packet capture
(from 10.0.1.10) showing the problem. Lwip is running on 10.0.0.98. 8220
255.720272
10.0.1.10
10.0.0.98
TCP 3839 > 80 [SYN] Seq=0 Win=65535 Len=0
MSS=1460 8226
255.721360
10.0.0.98
10.0.1.10
TCP 80 > 3839 [SYN, ACK] Seq=0 Ack=1 Win=2048
Len=0 MSS=256 8227
255.721367
10.0.1.10
10.0.0.98
TCP 3839 > 80 [ACK] Seq=1 Ack=1 Win=65535 [TCP
CHECKSUM INCORRECT] Len=0 8228
255.721403
10.0.1.10
10.0.0.98
HTTP GET /style.css HTTP/1.1 8229 255.721408
10.0.1.10
10.0.0.98
HTTP Continuation or non-HTTP traffic 8234
255.722043
10.0.0.98
10.0.1.10
TCP 80 > 3839 [ACK] Seq=1 Ack=257 Win=2048
Len=0 8235
255.722579
10.0.0.98 10.0.1.10
HTTP Continuation or non-HTTP traffic 8236
255.722583
10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8237
255.722589
10.0.1.10
10.0.0.98
TCP 3839 > 80 [ACK] Seq=341 Ack=513 Win=65535
[TCP CHECKSUM INCORRECT] Len=0 8239
255.723393
10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8240
255.723398
10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8241
255.723399
10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8242
255.723405
10.0.1.10
10.0.0.98
TCP 3839 > 80 [ACK] Seq=341 Ack=1281 Win=65535
[TCP CHECKSUM INCORRECT] Len=0 8243
255.724113
10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8244
255.724117
10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8245
255.724118 10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8246
255.724124
10.0.1.10
10.0.0.98
TCP 3839 > 80 [ACK] Seq=341 Ack=2013 Win=65535
[TCP CHECKSUM INCORRECT] Len=0 8247
255.724454 10.0.0.98 10.0.1.10
HTTP Continuation or non-HTTP traffic 8248
255.724898
10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8249
255.724902
10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8250
255.724904
10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8251
255.724909
10.0.1.10
10.0.0.98
TCP 3839 > 80 [ACK] Seq=341 Ack=3037 Win=65535
[TCP CHECKSUM INCORRECT] Len=0 8252
255.725587
10.0.0.98
10.0.1.10
HTTP Continuation or non-HTTP traffic 8253
255.725591
10.0.0.98
10.0.1.10
TCP 80 > 3839 [FIN, ACK] Seq=3249 Ack=257
Win=2048 Len=0 8254
255.725597
10.0.1.10
10.0.0.98
TCP 3839 > 80 [ACK] Seq=341 Ack=3250 Win=65323
[TCP CHECKSUM INCORRECT] Len=0 8255
255.725636
10.0.1.10
10.0.0.98
TCP 3839 > 80 [FIN, ACK] Seq=341 Ack=3250
Win=65323 [TCP CHECKSUM INCORRECT] Len=0 8256
255.725930
10.0.0.98
10.0.1.10
TCP [TCP Dup ACK 8253#1] 80 > 3839 [ACK]
Seq=3250 Ack=257 Win=2048 Len=0 8257
255.725935
10.0.1.10
10.0.0.98
HTTP [TCP Out-Of-Order] Continuation or non-HTTP
traffic 8258
255.726268
10.0.0.98
10.0.1.10
TCP 80 > 3839 [ACK] Seq=3250 Ack=341 Win=2048
Len=0 8259
258.080107
10.0.1.10
10.0.0.98
TCP 3839 > 80 [FIN, ACK] Seq=341 Ack=3250
Win=65323 [TCP CHECKSUM INCORRECT] Len=0 8260
258.080419
10.0.0.98
10.0.1.10
TCP 80 > 3839 [ACK] Seq=3250 Ack=342 Win=2048
Len=0 TCP debugging looks
like… TCP connection request 3839
-> 80. TCP connection established
3839 -> 80. tcp_recved: recveived 256
bytes, wnd 2048 (0). (((the next segment is
lost))) tcp_close: closing in State:
ESTABLISHED TCP connection closed 3839
-> 80. tcp_pcb_purge tcp_pcb_purge: data left on
->ooseq (((receive callback never
called))) Thanks Ben Hastings |
[Prev in Thread] | Current Thread | [Next in Thread] |