|
From: | Mike Rosing |
Subject: | [lwip-users] ARP message stops TCP? |
Date: | Mon, 17 Jul 2017 16:06:59 -0500 (CDT) |
I have a strange problem where raw API LwIP (2.0.2) seems to work most of the time but appears to randomly stop. So I put wireshark up and looked at the messages between LwIP and the server it was talking to.
My system waits for a time range from the server, does a calculation using that time as a pair of pointers into a large block of RAM and then sends the result back to the server. If an ARP message happens before this begins there are no problems. But if an ARP message happens while I'm doing the calculation, I can never send the data to the server.
Error checking on tcp_write() and tcp_output() always gives ERR_OK. I use tcp_sent to set a callback, and when it does not get called after 6 seconds I call tcp_output() again. After 15 tries, I give up and close the connection.
Here is a wireshark summary where it works:
245 13598.007929140 Dell_ff:f5:5a LogicPro_03:6a:c6 ARP 42 Who has 192.168.1.18? Tell 192.168.1.13
246 13598.008196524 LogicPro_03:6a:c6 Dell_ff:f5:5a ARP 60 192.168.1.18 is at 00:08:ee:03:6a:c6
247 13941.860230081 192.168.1.13 192.168.1.18 TCP 88 41177 → 41177 [PSH, ACK] Seq=1767318291 Ack=45 Win=29200 Len=34
248 13941.860460015 192.168.1.13 192.168.1.18 TCP 88 [TCP Retransmission] 41177 → 41177 [PSH, ACK] Seq=1767318291 Ack=45 Win=29200 Len=34
249 13942.679840223 192.168.1.13 192.168.1.18 TCP 88 [TCP Retransmission] 41177 → 41177 [PSH, ACK] Seq=1767318291 Ack=45 Win=29200 Len=34
250 13942.680185640 192.168.1.18 192.168.1.13 TCP 60 41177 → 41177 [ACK] Seq=45 Ack=1767318325 Win=5898 Len=0
251 13944.948207758 192.168.1.18 192.168.1.13 TCP 76 [TCP Retransmission] 41177 → 41177 [PSH, ACK] Seq=45 Ack=1767318325 Win=5898 Len=22
252 13944.948240490 192.168.1.13 192.168.1.18 TCP 54 41177 → 41177 [ACK] Seq=1767318325 Ack=67 Win=29200 Len=0
In this case LwIP took a while to ACK the message (LwIP on .18) and several seconds to do the computation, and then it sent back the answer.
Here is a wireshark summary where it fails:
263 15129.006878304 192.168.1.13 192.168.1.18 TCP 88 41177 → 41177 [PSH, ACK] Seq=1767318393 Ack=111 Win=29200 Len=34
264 15129.007133951 192.168.1.13 192.168.1.18 TCP 88 [TCP Retransmission] 41177 → 41177 [PSH, ACK] Seq=1767318393 Ack=111 Win=29200 Len=34
265 15132.471845070 192.168.1.13 192.168.1.18 TCP 88 [TCP Retransmission] 41177 → 41177 [PSH, ACK] Seq=1767318393 Ack=111 Win=29200 Len=34
266 15132.472195643 192.168.1.18 192.168.1.13 TCP 60 41177 → 41177 [ACK] Seq=133 Ack=1767318427 Win=5796 Len=0
267 15134.007797724 Dell_ff:f5:5a LogicPro_03:6a:c6 ARP 42 Who has 192.168.1.18? Tell 192.168.1.13
268 15134.008057081 LogicPro_03:6a:c6 Dell_ff:f5:5a ARP 60 192.168.1.18 is at 00:08:ee:03:6a:c6
269 15221.737092479 192.168.1.18 192.168.1.13 TCP 60 [TCP Retransmission] 41177 → 41177 [FIN, ACK] Seq=133 Ack=1767318427 Win=5796 Len=0
The code times out and says the write failed because it called tcp_write() and then tcp_output() 15 times - and the data was never sent.
I think I can force this to work by closing the server side on a timeout (and probably reducing the number of tries on the client side before giving up). I hope that I have something stupid set in the opt.h or lwipopts.h files.
I suspect the ARP messages just happen to be a clue, not a cause, but I do not know enough about how network protocols are supposed to work. Any ideas on what the problem could be?
Thanks,
Mike
[Prev in Thread] | Current Thread | [Next in Thread] |