This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Why causes ARP request a “TCP previous segment lost”?

1

We have two windows XP pc's 10.10.10.1 and 10.10.10.2 with a direct connection to each other (So not via a router). These both have a user application running, which continuously exchange information via TCP/IP. When the communication between the 2 pc's is logged with wireshark, it shows every 10 minutes a ARP request is broadcast to refresh the arp cache. However once in a while, the ARP message exchange seems to set off the whole network communication. In wireshark it is followed by a TCP previous segment lost message. Here an example:

26893 1578.844599 10.10.10.2            10.10.10.1            TCP      146    kyoceranetdev > scol [PSH, ACK] Seq=3613653 Ack=402623 Win=65280 Len=92 2012-06-06 15:22:02.178605 146

26894 1579.029143 IntelCor_8d:0c:6d Broadcast ARP 42 Who has 10.10.10.1? Tell 10.10.10.2 2012-06-06 15:22:02.363149 42

26895 1579.030506 IntelCor_8d:0c:4f IntelCor_8d:0c:6d ARP 60 10.10.10.1 is at 00:15:17:8d:0c:4f 2012-06-06 15:22:02.364512 60

26896 1579.030511 10.10.10.2 10.10.10.1 TCP 191 [TCP Previous segment lost] kyoceranetdev > scol [PSH, ACK] Seq=3615205 Ack=402623 Win=65280 Len=137 2012-06-06 15:22:02.364517 191

26897 1579.030666 10.10.10.1 10.10.10.2 TCP 66 scol > kyoceranetdev [ACK] Seq=402623 Ack=3613745 Win=64538 Len=0 SLE=3615205 SRE=3615342 2012-06-06 15:22:02.364672 66

26898 1579.291510 10.10.10.2 10.10.10.1 TCP 1514 kyoceranetdev > scol [PSH, ACK] Seq=3615342 Ack=402623 Win=65280 Len=1460 2012-06-06 15:22:02.625516 1514

26899 1579.291522 10.10.10.2 10.10.10.1 TCP 62 kyoceranetdev > scol [PSH, ACK] Seq=3616802 Ack=402623 Win=65280 Len=8 2012-06-06 15:22:02.625528 62

26900 1579.298290 10.10.10.1 10.10.10.2 TCP 66 [TCP Dup ACK 26897#1] scol > kyoceranetdev [ACK] Seq=402623 Ack=3613745 Win=64538 Len=0 SLE=3615205 SRE=3616802 2012-06-06 15:22:02.632296 66

26901 1579.298296 10.10.10.1 10.10.10.2 TCP 66 [TCP Dup ACK 26897#2] scol > kyoceranetdev [ACK] Seq=402623 Ack=3613745 Win=64538 Len=0 SLE=3615205 SRE=3616810 2012-06-06 15:22:02.632302 66

26902 1579.298306 10.10.10.2 10.10.10.1 TCP 210 [TCP Fast Retransmission] kyoceranetdev > scol [PSH, ACK] Seq=3613745 Ack=402623 Win=65280 Len=156 2012-06-06 15:22:02.632312 210

26903 1579.298306 10.10.10.2 10.10.10.1 TCP 218 kyoceranetdev > scol [PSH, ACK] Seq=3616810 Ack=402623 Win=65280 Len=164 2012-06-06 15:22:02.632312 218

Anyone any idea why this is happening?

asked 12 Jul ‘12, 05:18

rv_deventer's gravatar image

rv_deventer
21115
accept rate: 0%

Can you post a capture file somewhere (perhaps www.cloudshark.org) that shows the problem? It’s hard to tell much from a text printout with only a single ARP request/response pair and only 11 packets total. It’s likely that the ARP and the lost segment are unrelated and it’s just coincidence that the lost segment sometimes comes right after an ARP.

(25 Jul ‘12, 17:30) Jim Aragon


2 Answers:

3

Let's sum it up in an answer.

Based on the information gathered during analyzing the problem, I conclude:

Windows: ARP Behavior

http://technet.microsoft.com/en-us/library/cc940021.aspx

Look at the end.

Cite: ARP queues only one outbound IP datagram for a given destination address while that IP address is being resolved to a MAC address. .... An application can compensate for this by calling the Iphlpapi.dll routine SendArp() to establish an arp cache entry, before sending the stream of packets.

Also this:

http://support.microsoft.com/kb/194881/en-us

Cite: While Windows NT awaits an ARP Response, ARP will "queue" the IP packet that needs to be sent. When Windows NT receives the ARP Response, it will only transmit the "latest" or last packet that it received in its "ARP Packet Queue" for any given destination host.

Conclusion: It looks like Windows dropped one (ore more) of your TCP packets, ALTHOUGH the information above seems to be only related to UDP. Maybe the information in the first link is wrong or the behavior is the same for TCP and other protocols.

HOWEVER, This behavior does not fully comply with the explanation above. At least one packet should have been queued and sent, after ARP finished. But maybe that's just the packet Wireshark marks with "TCP Previous segment was not captured" and another packet with payload data was dropped. Without insight into the application, we will never know.

Question: Why does it happen only once in a while?
Answer: It's simply a timing issue. There may be short time frames where the application does not transmit any data. If the ARP refresh takes place in that time frame, you won't see any problems, as no packets need to be queued.

Solution for your problem, in the order I would recommend:

  • do nothing, as TCP recovers from that situation by itself
  • call SendArp() from time to time in your application to update the arp cache before the entry expires
  • set a static ARP entry, verified by yourself. However, that's not a good idea

It's apparently the same for other operation systems, however the "ARP queue length" is different.

Linux: You can configured the ARP queue len with the parameter unres_qlen via systcl (Default: 3).

http://www.kernel.org/doc/man-pages/online/pages/man7/arp.7.html
http://www.6test.edu.cn/~lujx/linux_networking/0131777203_ch15lev1sec3.html

AIX: Same for AIX with arpqsize (Default: 12).

http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/arp_cache_tuning.htm

Regards
Kurt

answered 27 Jul '12, 02:15

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

Thanks Kurt for this extensive 'research', I understand it is some kind of default behaviour. Still strange that it looks like windows drops packets, except the last one.

Anyway, we think we have 'fixed' it as follows: Every time at start-up of our application the application uses SendARP etc. to once discover the mac address of the other pc. Then it adds a static entry to the arp cache.

Thanks, Rudy PS. I tried to award points to you, but in some way I cannot set the award points slider to a value different from 1. (?)

(31 Jul '12, 05:56) rv_deventer

Thanks Kurt for this extensive 'research',

You're welcome and I learned something too ;-))

Still strange that it looks like windows drops packets, except the last one.

That's the way the Windows stack seems to work. It will only buffer one packet during the ARP request. So, if there are severeal packets sent during that time, all are dropped, except the last one.

Every time at start-up of our application the application uses SendARP etc. to once discover the mac address of the other pc. Then it adds a static entry to the arp cache.

SendArp() is O.K, but the static ARP entry is problematic!!

This entry will be there until the machine reboots. Imagine, the NIC of the computer for which you have a static entry needs to be replaced (or the whole machine needs to be replaced). After it boots up, you will still try to contact the old MAC address due to the static ARP cache entry. It will take you (your admins, customers) quite some time to figure out what's going wrong. It's certainly better to use SendArp() throughout your application, triggered by a timer.

I tried to award points to you, but in some way I cannot set the award points slider to a value different from 1. (?)

That's because you only have 1 karma point yourself. Just accept the answer (check mark) and if you like vote it up.

(31 Jul '12, 06:14) Kurt Knochner ♦

I agree with your suggestion that SendARp throughout the application is better solution. With our solution, we are aware that we have an issue when the computer for which we have the static entry needs to be replaced. As a first approach we are going to instruct our service department (who does replacements) to reboot both computers. That will then prevent it. We then can still decide in the future to implement the timer triggered refresh...

(31 Jul '12, 06:25) rv_deventer

BTW: Why do you need to do anything at all? TCP recovers from the situation by its retransmission mechanism.

(31 Jul '12, 07:03) Kurt Knochner ♦

0

Anyone any idea why this is happening?

Not a real idea, just some guessing for now.

However once in a while, the ARP message exchange seems to set off the whole network communication

Is that "once in a while" in the regular 10 minute schedule (Windows XP ARP cache renewal time for used entries)?

If NO: You say, the computers are connected directly to each other. I assume a CAT5/6 cable. Maybe the RJ45 connector is not plugged in fully (at either end) and due to small movements (vibrations) the connector loses contact with some pins, which could cause packet loss. Maybe even the link state is lost. However: the time difference between the last TCP packet and the ARP request is probalby to short to reestablish the link. Anyway, can you please check the physical connection (unplug/plug) at both ends?

If YES: I have no idea (yet) what causes the loss of at least one packet (10.10.10.2 -> 10.10.10.1).

Regards
Kurt

answered 25 Jul '12, 13:40

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

edited 25 Jul '12, 13:43

Hi Kurt,

Thanks for your answer.

ARP broadcast and ARP cache renewal is every 10 minutes. Most renewals do not set off communication, it's just once in a while. We did replace the UTP cable between the pc's. Did not help. Further we do see the same issue at different customers (we have several systems in the field).

We also did the following test: we added a static entry to the arp cache. That fixed the communication problem. It proved that the ARP renewal causes the communication is been set off.

I uploaded 4 dump files, which shows the issue:

PC 10.10.10.1 Dump6.pcap    http://www.cloudshark.org/captures/380ef7488e8e
PC 10.10.10.2 Dump6.pcap    http://www.cloudshark.org/captures/6539364bc235
Dump6
 PC 10.10.10.1       PC 10.10.10.2
   Packet no           Packet no
     2046               2326   ARP
    14544              14824   ARP   -> Sets off communication

PC 10.10.10.1 Dump8.pcap http://www.cloudshark.org/captures/d550e87c5451 PC 10.10.10.2 Dump8.pcap http://www.cloudshark.org/captures/51a6c13672d1 Dump8 PC 10.10.10.1 PC 10.10.10.2 Packet no Packet no 687 267 ARP 11311 10891 ARP 20314 19894 ARP -> Sets off communication 32437 32017 ARP

Regards Rudy

(27 Jul ‘12, 00:10) rv_deventer
1

looking at capture Dump6.cap I can see the following:

Wireshark shows “TCP previous segment not captured” for BOTH capture files (Frame #2329 PC_10.10.10.2_Dump6.cap and Frame #2049 PC_10.10.10.1_Dump6.cap).

I conclude: The Windows TCP/IP stack must have dropped the packet internally, before it was sent to the network, maybe due to the ARP request.

A quick search on google revealed a similar problem (with UDP).

http://www.groupsrv.com/computers/about668311.html

Unfortunately the link to experts-exchange.com is void.

I guess it's either a bug in Windows or "works as designed" ;-))

UPDATE: A rather old link from microsoft itself, pointing at a similar problem. Maybe it's not just a problem in routing mode and still not fixed in Windows XP.

http://support.microsoft.com/kb/194881/en-us

Cite: While Windows NT awaits an ARP Response, ARP will "queue" the IP packet that needs to be sent. When Windows NT receives the ARP Response, it will only transmit the "latest" or last packet that it received in its "ARP Packet Queue" for any given destination host.

Doing a bit more "research" ... There needs (should) to be a queue in the TCP/IP stack that holds packets until the ARP request finished.

Linux: Take a look at this description of ARP handling in the kernel.

http://www.6test.edu.cn/~lujx/linux_networking/0131777203_ch15lev1sec3.html

You can configured the ARP queue len with the parameter unres_qlen via systcl (Default: 3).

http://www.kernel.org/doc/man-pages/online/pages/man7/arp.7.html

AIX: Same for AIX with arpqsize (Default: 12).

http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/arp_cache_tuning.htm

Maybe there is also a parameter for Windows, however I was not able to find one (yet).

I guess this pretty much explains it:

http://technet.microsoft.com/en-us/library/cc940021.aspx

Cite: ARP queues only one outbound IP datagram for a given destination address while that IP address is being resolved to a MAC address. .... An application can compensate for this by calling the Iphlpapi.dll routine SendArp() to establish an arp cache entry, before sending the stream of packets.

Solution for your problem:

  • set a static ARP entry (not a good idea)
  • calling SendArp() from time to time to update the arp cache before the entry expires
(27 Jul '12, 01:25) Kurt Knochner ♦