lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-users] Handshake trouble when packet is lost


From: Enrico Murador - Research & Development - CET
Subject: [lwip-users] Handshake trouble when packet is lost
Date: Tue, 03 May 2011 15:17:34 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10

Hi all,

I'm currently using Lwip 1.4.rc2, but the issue I will describe was present also in 1.3.2.

I noticed a strange behaviour when one (or more?) packet is lost (in my case, by "induced" collisions)
during a communication between the software on the (linux-based) PC and my custom board running Lwip.

Following you can see the communication log, filtered by the IP addresses of the PC (192.168.6.77)
and the device running Lwip (192.168.6.112).

NOTE: I "induced" the collisions by using a hub (not a switch) between the PC and the device.
This hub repeats -all- the packets that receives on all the ports, also back to the transmitter port.

No.     Time            Source                Destination           Protocol Info
  10260 10:21:57.699259 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7091 Ack=27179 Win=63784 Len=58
  10269 10:21:57.701777 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [PSH, ACK] Seq=27179 Ack=7149 Win=666 Len=226
  10270 10:21:57.701827 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [ACK] Seq=7149 Ack=27405 Win=63784 Len=0
  10344 10:21:58.104353 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7149 Ack=27405 Win=63784 Len=58
  10348 10:21:58.106599 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [PSH, ACK] Seq=27405 Ack=7207 Win=608 Len=226
  10349 10:21:58.106660 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [ACK] Seq=7207 Ack=27631 Win=63784 Len=0
  10431 10:21:58.509983 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7207 Ack=27631 Win=63784 Len=58
[Normally after 3 ms arrives an ACK or a response]
  10459 10:21:58.712484 192.168.6.77          192.168.6.112         TCP      [TCP Retransmission] 47654 > 1024 [PSH, ACK] Seq=7207 Ack=27631 Win=63784 Len=58
[Operating system (Linux) repeats previous message]
  10464 10:21:58.713390 192.168.6.112         192.168.6.77          TCP      [TCP Previous segment lost] 1024 > 47654 [ACK] Seq=27857 Ack=7265 Win=550 Len=0
[Lwip device replies with an ACK, having "wrong" Seq number (request's ACK + response's Len) ]
  10540 10:21:59.011019 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7265 Ack=27631 Win=63784 Len=58
[500 ms after first request message (10431) PC application repeats request (there is no response until now)]
  10541 10:21:59.011991 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [ACK] Seq=27857 Ack=7323 Win=1072 Len=0
[Lwip device replies with an ACK, having "wrong" Seq number (request's ACK + response's Len)]
  10542 10:21:59.013301 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [PSH, ACK] Seq=27857 Ack=7323 Win=1072 Len=226
[Lwip device response, having "wrong" Seq number (request's ACK + response's Len)]
  10543 10:21:59.013316 192.168.6.77          192.168.6.112         TCP      [TCP Dup ACK 10540#1] 47654 > 1024 [ACK] Seq=7323 Ack=27631 Win=63784 Len=0
[PC ACK, but operating system does not consider message 10542 as a valid response, and PC application will repeat request.]
  10645 10:21:59.513639 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7323 Ack=27631 Win=63784 Len=58
[500 ms after second request message (10540) PC application repeats request]
  10646 10:21:59.515902 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [PSH, ACK] Seq=28083 Ack=7381 Win=1014 Len=226
[Lwip device response, having "wrong" Seq number (request's ACK + 2 * response's Len)]
  10647 10:21:59.515939 192.168.6.77          192.168.6.112         TCP      [TCP Dup ACK 10645#1] 47654 > 1024 [ACK] Seq=7381 Ack=27631 Win=63784 Len=0
[PC ACK, but operating system does not consider message 10646 as a valid response, and PC application will consider this as a failure.]
  10685 10:21:59.688023 192.168.6.112         192.168.6.77          TCP      [TCP Retransmission] 1024 > 47654 [PSH, ACK] Seq=27631 Ack=7381 Win=1014 Len=226
[Lwip device response, having "correct" Seq number (I am not sure, but this seems to satisfy operating system mechanism)]
  10686 10:21:59.688059 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [ACK] Seq=7381 Ack=28309 Win=63106 Len=0
  10687 10:21:59.688291 192.168.6.112         192.168.6.77          TCP      [TCP Retransmission] 1024 > 47654 [PSH, ACK] Seq=27857 Ack=7381 Win=1014 Len=226
[PC ACK, success.]
  10688 10:21:59.688322 192.168.6.77          192.168.6.112         TCP      [TCP Dup ACK 10686#1] 47654 > 1024 [ACK] Seq=7381 Ack=28309 Win=63106 Len=0
[Lwip device response, having "wrong" Seq number (request's ACK + response's Len)]
  10788 10:22:00.141265 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7381 Ack=28309 Win=63784 Len=58
[PC ACK, but operating system does not consider message 10687 as a valid response, but PC application will be satisfied with previous message]
  10797 10:22:00.147804 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [PSH, ACK] Seq=28309 Ack=7439 Win=956 Len=226
  10798 10:22:00.147824 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [ACK] Seq=7439 Ack=28535 Win=63784 Len=0
  10855 10:22:00.536986 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7439 Ack=28535 Win=63784 Len=58
  10872 10:22:00.547107 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [PSH, ACK] Seq=28535 Ack=7497 Win=898 Len=226
  10873 10:22:00.547123 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [ACK] Seq=7497 Ack=28761 Win=63784 Len=0
  10940 10:22:00.945123 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7497 Ack=28761 Win=63784 Len=58
  10943 10:22:00.948128 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [PSH, ACK] Seq=28761 Ack=7555 Win=840 Len=226
  10944 10:22:00.948167 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [ACK] Seq=7555 Ack=28987 Win=63784 Len=0
  11029 10:22:01.360900 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7555 Ack=28987 Win=63784 Len=58
  11031 10:22:01.363038 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [PSH, ACK] Seq=28987 Ack=7613 Win=782 Len=226
  11032 10:22:01.363063 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [ACK] Seq=7613 Ack=29213 Win=63784 Len=0
  11114 10:22:01.767714 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7613 Ack=29213 Win=63784 Len=58
  11151 10:22:01.972350 192.168.6.77          192.168.6.112         TCP      [TCP Retransmission] 47654 > 1024 [PSH, ACK] Seq=7613 Ack=29213 Win=63784 Len=58
  11158 10:22:01.975497 192.168.6.112         192.168.6.77          TCP      [TCP Previous segment lost] 1024 > 47654 [ACK] Seq=29439 Ack=7671 Win=724 Len=0
  11222 10:22:02.271579 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7671 Ack=29213 Win=63784 Len=58
  11227 10:22:02.273738 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [PSH, ACK] Seq=29439 Ack=7729 Win=666 Len=226
  11228 10:22:02.273754 192.168.6.77          192.168.6.112         TCP      [TCP Dup ACK 11222#1] 47654 > 1024 [ACK] Seq=7729 Ack=29213 Win=63784 Len=0
  11322 10:22:02.780048 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [PSH, ACK] Seq=7729 Ack=29213 Win=63784 Len=58
  11338 10:22:02.789122 192.168.6.112         192.168.6.77          TCP      1024 > 47654 [PSH, ACK] Seq=29665 Ack=7787 Win=608 Len=226
  11339 10:22:02.789180 192.168.6.77          192.168.6.112         TCP      [TCP Dup ACK 11322#1] 47654 > 1024 [ACK] Seq=7787 Ack=29213 Win=63784 Len=0
  11416 10:22:03.188396 192.168.6.112         192.168.6.77          TCP      [TCP Retransmission] 1024 > 47654 [PSH, ACK] Seq=29213 Ack=7787 Win=608 Len=226
  11417 10:22:03.188415 192.168.6.77          192.168.6.112         TCP      47654 > 1024 [ACK] Seq=7787 Ack=29891 Win=63106 Len=0
  11418 10:22:03.188397 192.168.6.112         192.168.6.77          TCP      [TCP Retransmission] 1024 > 47654 [PSH, ACK] Seq=29439 Ack=7787 Win=608 Len=226
  11419 10:22:03.188510 192.168.6.77          192.168.6.112         TCP      [TCP Dup ACK 11417#1] 47654 > 1024 [ACK] Seq=7787 Ack=29891 Win=63106 Len=0

From packet number 10431 to 10688 we can see one of these "messy handshakes", that "resolves" at the third attempt from the PC to make the request.
From packet number 11114 to the end, the situation is almost the same, apart the fact that the PC failed the third attempt to communicate and the program
stopped at a breakpoint placed on socket close, probably due to time passed between answer (message 11322) and corresponding response (message 11416), 400 ms .

It seems that the board running Lwip "looses" the correct sequence numbering for a while, after at least one packet (for example the reply to packet 10431, i think)
becomes lost. Unfortunately I cannot actually detect how many packets (and when) are really lost.

Sometimes, this synchronization issue doesn't resolve itself also after 5 retries from the PC, 1 second spaced from each other. This lends me to think that is not
a "timing" problem that leads to congestion, but perhaps an error on sequence number calculation... but maybe I'm wrong.

Can someone take a look and tell me if the described situation could be normal in case of packet collisions, or if there is effectively an issue?

Thank you very much
Enrico


reply via email to

[Prev in Thread] Current Thread [Next in Thread]