Ask Your Question

Revision history [back]

Intermittent Network Slowness/Complete loss of Connectivity

I have a network stood up with vSphere. Over the past couple of years I have been experiencing occasional drops in network latency, or a complete loss of connectivity between servers. The interesting part here is that it's always the same servers that seem to have the issue. (i.e. I have a script that I wrote to detect network instability between one host many others, grepping through months of that data, several servers have upwards of 200 detected events, while others have 0).

I have been trying desperately to determine the source of these network issues. Recently, I wrote a script that would fire at the end of a cron that I have that detects the network events. The script tests ssh latency between one host and many, but before it starts the latency test, I start a packet trace using tshark and filtering on traffic coming from or going to the host that I'm testing and coming from or going to the host that I'm running the script from. It also filters on traffic on port 22 as I use ssh commands with the latency test.

Any help on this would be greatly appreciated. I'm a software engineer, not a network guy, just have enough knowledge to get this far.

Here is a stack trace I collected when a server was experiencing network degradation:

    10 0.574704304    host -> client    TCP 74 56104 > ssh [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=443780045 TSecr=0 WS=128
 33 0.622798903    client -> host    TCP 74 ssh > 56104 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=443784521 TSecr=443780045 WS=128
 34 0.622823233    host -> client    TCP 66 56104 > ssh [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=443780093 TSecr=443784521
 35 0.622872658    host -> client    SSH 87 Client Protocol: SSH-2.0-OpenSSH_7.4\r
 36 0.671884861    client -> host    TCP 66 ssh > 56104 [ACK] Seq=1 Ack=22 Win=29056 Len=0 TSval=443784570 TSecr=443780093
530 16.713179167    client -> host    SSHv2 527 Server: Key Exchange Init
531 16.713203567    host -> client    TCP 66 56104 > ssh [ACK] Seq=22 Ack=462 Win=30336 Len=0 TSval=443796184 TSecr=443800624
532 16.714881821    host -> client    SSHv2 1314 Client: Diffie-Hellman Key Exchange Init
533 16.715060860    client -> host    TCP 66 ssh > 56104 [ACK] Seq=462 Ack=1270 Win=31872 Len=0 TSval=443800639 TSecr=443796185
534 16.717441079    client -> host    SSHv2 474 Server: New Keys
535 16.718751241    host -> client    SSHv2 146 Client: New Keys
536 16.719009654    client -> host    TCP 130 [TCP segment of a reassembled PDU]
537 16.719097445    host -> client    TCP 146 56104 > ssh [PSH, ACK] Seq=1350 Ack=934 Win=31360 Len=80 TSval=443796189 TSecr=443800643[Reassembly error, protocol TCP: New fragment past old data limits]
538 16.724163039    client -> host    TCP 1514 [TCP segment of a reassembled PDU]
539 16.724173258    client -> host    TCP 74 [TCP segment of a reassembled PDU]
540 16.724179307    host -> client    TCP 66 56104 > ssh [ACK] Seq=1430 Ack=2390 Win=34304 Len=0 TSval=443796195 TSecr=443800648
541 16.724268787    host -> client    TCP 450 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796195 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
547 16.957149181    host -> client    TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796428 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
551 17.190125831    host -> client    TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796661 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
565 17.657163520    host -> client    TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443797128 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
566 17.657491865    client -> host    TCP 78 [TCP Previous segment not captured] ssh > 56104 [ACK] Seq=2742 Ack=1814 Win=34432 Len=0 TSval=443801581 TSecr=443797128 SLE=1430 SRE=1814
567 17.704925741    client -> host    TCP 418 [TCP Retransmission] [TCP segment of a reassembled PDU]
568 17.707069100    host -> client    TCP 738 56104 > ssh [PSH, ACK] Seq=1814 Ack=2742 Win=37248 Len=672 TSval=443797177 TSecr=443801629[Reassembly error, protocol TCP: New fragment past old data limits]
569 17.707270598    client -> host    TCP 66 ssh > 56104 [ACK] Seq=2742 Ack=2486 Win=36864 Len=0 TSval=443801631 TSecr=443797177
597 18.685946777    client -> host    TCP 642 [TCP segment of a reassembled PDU]
598 18.725132574    host -> client    TCP 66 56104 > ssh [ACK] Seq=2486 Ack=3318 Win=40064 Len=0 TSval=443798196 TSecr=443802610
599 18.747191756    host -> client    TCP 146 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798218 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
604 18.973126314    host -> client    TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798444 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
608 19.199141784    host -> client    TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798670 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
626 19.652156589    host -> client    TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443799123 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
627 19.652531210    client -> host    TCP 66 ssh > 56104 [ACK] Seq=3318 Ack=2566 Win=36864 Len=0 TSval=443803576 TSecr=443799123
628 19.652554872    client -> host    TCP 130 [TCP segment of a reassembled PDU]
629 19.652563170    host -> client    TCP 66 56104 > ssh [ACK] Seq=2566 Ack=3382 Win=40064 Len=0 TSval=443799123 TSecr=443803576
630 19.652762859    host -> client    TCP 210 56104 > ssh [PSH, ACK] Seq=2566 Ack=3382 Win=40064 Len=144 TSval=443799123 TSecr=443803576[Reassembly error, protocol TCP: New fragment past old data limits]
631 19.657257053    client -> host    TCP 178 [TCP segment of a reassembled PDU]
632 19.657447700    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=2710 Ack=3494 Win=40064 Len=64 TSval=443799128 TSecr=443803581[Reassembly error, protocol TCP: New fragment past old data limits]
633 19.677567903    client -> host    TCP 130 [TCP segment of a reassembled PDU]
634 19.677893785    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=2774 Ack=3558 Win=40064 Len=64 TSval=443799148 TSecr=443803601[Reassembly error, protocol TCP: New fragment past old data limits]
635 19.678241891    client -> host    TCP 130 [TCP segment of a reassembled PDU]
636 19.678488520    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=2838 Ack=3622 Win=40064 Len=64 TSval=443799149 TSecr=443803602[Reassembly error, protocol TCP: New fragment past old data limits]
637 19.678689498    client -> host    TCP 130 [TCP segment of a reassembled PDU]
638 19.678894272    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=2902 Ack=3686 Win=40064 Len=64 TSval=443799149 TSecr=443803602[Reassembly error, protocol TCP: New fragment past old data limits]
639 19.679095118    client -> host    TCP 130 [TCP segment of a reassembled PDU]
640 19.679313200    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=2966 Ack=3750 Win=40064 Len=64 TSval=443799150 TSecr=443803603[Reassembly error, protocol TCP: New fragment past old data limits]
641 19.679619877    client -> host    TCP 130 [TCP segment of a reassembled PDU]
642 19.679949334    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=3030 Ack=3814 Win=40064 Len=64 TSval=443799150 TSecr=443803603[Reassembly error, protocol TCP: New fragment past old data limits]
643 19.680344884    client -> host    TCP 130 [TCP segment of a reassembled PDU]
644 19.680551898    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=3094 Ack=3878 Win=40064 Len=64 TSval=443799151 TSecr=443803604[Reassembly error, protocol TCP: New fragment past old data limits]
645 19.680901329    client -> host    TCP 130 [TCP segment of a reassembled PDU]
646 19.681128070    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=3158 Ack=3942 Win=40064 Len=64 TSval=443799152 TSecr=443803605[Reassembly error, protocol TCP: New fragment past old data limits]
647 19.681431225    client -> host    TCP 130 [TCP segment of a reassembled PDU]
648 19.681629690    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=3222 Ack=4006 Win=40064 Len=64 TSval=443799152 TSecr=443803605[Reassembly error, protocol TCP: New fragment past old data limits]
649 19.681870571    client -> host    TCP 130 [TCP segment of a reassembled PDU]
650 19.682068642    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=3286 Ack=4070 Win=40064 Len=64 TSval=443799152 TSecr=443803606[Reassembly error, protocol TCP: New fragment past old data limits]

Intermittent Network Slowness/Complete loss of Connectivity

I have a network stood up with vSphere. Over the past couple of years I have been experiencing occasional drops in network latency, or a complete loss of connectivity between servers. The interesting part here is that it's always the same servers that seem to have the issue. (i.e. I have a script that I wrote to detect network instability between one host many others, grepping through months of that data, several servers have upwards of 200 detected events, while others have 0).

I have been trying desperately to determine the source of these network issues. Recently, I wrote a script that would fire at the end of a cron that I have that detects the network events. The script tests ssh latency between one host and many, but before it starts the latency test, I start a packet trace using tshark and filtering on traffic coming from or going to the host that I'm testing and coming from or going to the host that I'm running the script from. It also filters on traffic on port 22 as I use ssh commands with the latency test.

Any help on this would be greatly appreciated. I'm a software engineer, not a network guy, just have enough knowledge to get this far.

Here is a stack trace the tshark output I collected when a server was experiencing network degradation:

    10 0.574704304    host -> client    TCP 74 56104 > ssh [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=443780045 TSecr=0 WS=128
 33 0.622798903    client -> host    TCP 74 ssh > 56104 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=443784521 TSecr=443780045 WS=128
 34 0.622823233    host -> client    TCP 66 56104 > ssh [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=443780093 TSecr=443784521
 35 0.622872658    host -> client    SSH 87 Client Protocol: SSH-2.0-OpenSSH_7.4\r
 36 0.671884861    client -> host    TCP 66 ssh > 56104 [ACK] Seq=1 Ack=22 Win=29056 Len=0 TSval=443784570 TSecr=443780093
530 16.713179167    client -> host    SSHv2 527 Server: Key Exchange Init
531 16.713203567    host -> client    TCP 66 56104 > ssh [ACK] Seq=22 Ack=462 Win=30336 Len=0 TSval=443796184 TSecr=443800624
532 16.714881821    host -> client    SSHv2 1314 Client: Diffie-Hellman Key Exchange Init
533 16.715060860    client -> host    TCP 66 ssh > 56104 [ACK] Seq=462 Ack=1270 Win=31872 Len=0 TSval=443800639 TSecr=443796185
534 16.717441079    client -> host    SSHv2 474 Server: New Keys
535 16.718751241    host -> client    SSHv2 146 Client: New Keys
536 16.719009654    client -> host    TCP 130 [TCP segment of a reassembled PDU]
537 16.719097445    host -> client    TCP 146 56104 > ssh [PSH, ACK] Seq=1350 Ack=934 Win=31360 Len=80 TSval=443796189 TSecr=443800643[Reassembly error, protocol TCP: New fragment past old data limits]
538 16.724163039    client -> host    TCP 1514 [TCP segment of a reassembled PDU]
539 16.724173258    client -> host    TCP 74 [TCP segment of a reassembled PDU]
540 16.724179307    host -> client    TCP 66 56104 > ssh [ACK] Seq=1430 Ack=2390 Win=34304 Len=0 TSval=443796195 TSecr=443800648
541 16.724268787    host -> client    TCP 450 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796195 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
547 16.957149181    host -> client    TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796428 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
551 17.190125831    host -> client    TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443796661 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
565 17.657163520    host -> client    TCP 450 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=1430 Ack=2390 Win=34304 Len=384 TSval=443797128 TSecr=443800648[Reassembly error, protocol TCP: New fragment past old data limits]
566 17.657491865    client -> host    TCP 78 [TCP Previous segment not captured] ssh > 56104 [ACK] Seq=2742 Ack=1814 Win=34432 Len=0 TSval=443801581 TSecr=443797128 SLE=1430 SRE=1814
567 17.704925741    client -> host    TCP 418 [TCP Retransmission] [TCP segment of a reassembled PDU]
568 17.707069100    host -> client    TCP 738 56104 > ssh [PSH, ACK] Seq=1814 Ack=2742 Win=37248 Len=672 TSval=443797177 TSecr=443801629[Reassembly error, protocol TCP: New fragment past old data limits]
569 17.707270598    client -> host    TCP 66 ssh > 56104 [ACK] Seq=2742 Ack=2486 Win=36864 Len=0 TSval=443801631 TSecr=443797177
597 18.685946777    client -> host    TCP 642 [TCP segment of a reassembled PDU]
598 18.725132574    host -> client    TCP 66 56104 > ssh [ACK] Seq=2486 Ack=3318 Win=40064 Len=0 TSval=443798196 TSecr=443802610
599 18.747191756    host -> client    TCP 146 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798218 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
604 18.973126314    host -> client    TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798444 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
608 19.199141784    host -> client    TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443798670 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
626 19.652156589    host -> client    TCP 146 [TCP Retransmission] 56104 > ssh [PSH, ACK] Seq=2486 Ack=3318 Win=40064 Len=80 TSval=443799123 TSecr=443802610[Reassembly error, protocol TCP: New fragment past old data limits]
627 19.652531210    client -> host    TCP 66 ssh > 56104 [ACK] Seq=3318 Ack=2566 Win=36864 Len=0 TSval=443803576 TSecr=443799123
628 19.652554872    client -> host    TCP 130 [TCP segment of a reassembled PDU]
629 19.652563170    host -> client    TCP 66 56104 > ssh [ACK] Seq=2566 Ack=3382 Win=40064 Len=0 TSval=443799123 TSecr=443803576
630 19.652762859    host -> client    TCP 210 56104 > ssh [PSH, ACK] Seq=2566 Ack=3382 Win=40064 Len=144 TSval=443799123 TSecr=443803576[Reassembly error, protocol TCP: New fragment past old data limits]
631 19.657257053    client -> host    TCP 178 [TCP segment of a reassembled PDU]
632 19.657447700    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=2710 Ack=3494 Win=40064 Len=64 TSval=443799128 TSecr=443803581[Reassembly error, protocol TCP: New fragment past old data limits]
633 19.677567903    client -> host    TCP 130 [TCP segment of a reassembled PDU]
634 19.677893785    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=2774 Ack=3558 Win=40064 Len=64 TSval=443799148 TSecr=443803601[Reassembly error, protocol TCP: New fragment past old data limits]
635 19.678241891    client -> host    TCP 130 [TCP segment of a reassembled PDU]
636 19.678488520    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=2838 Ack=3622 Win=40064 Len=64 TSval=443799149 TSecr=443803602[Reassembly error, protocol TCP: New fragment past old data limits]
637 19.678689498    client -> host    TCP 130 [TCP segment of a reassembled PDU]
638 19.678894272    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=2902 Ack=3686 Win=40064 Len=64 TSval=443799149 TSecr=443803602[Reassembly error, protocol TCP: New fragment past old data limits]
639 19.679095118    client -> host    TCP 130 [TCP segment of a reassembled PDU]
640 19.679313200    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=2966 Ack=3750 Win=40064 Len=64 TSval=443799150 TSecr=443803603[Reassembly error, protocol TCP: New fragment past old data limits]
641 19.679619877    client -> host    TCP 130 [TCP segment of a reassembled PDU]
642 19.679949334    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=3030 Ack=3814 Win=40064 Len=64 TSval=443799150 TSecr=443803603[Reassembly error, protocol TCP: New fragment past old data limits]
643 19.680344884    client -> host    TCP 130 [TCP segment of a reassembled PDU]
644 19.680551898    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=3094 Ack=3878 Win=40064 Len=64 TSval=443799151 TSecr=443803604[Reassembly error, protocol TCP: New fragment past old data limits]
645 19.680901329    client -> host    TCP 130 [TCP segment of a reassembled PDU]
646 19.681128070    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=3158 Ack=3942 Win=40064 Len=64 TSval=443799152 TSecr=443803605[Reassembly error, protocol TCP: New fragment past old data limits]
647 19.681431225    client -> host    TCP 130 [TCP segment of a reassembled PDU]
648 19.681629690    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=3222 Ack=4006 Win=40064 Len=64 TSval=443799152 TSecr=443803605[Reassembly error, protocol TCP: New fragment past old data limits]
649 19.681870571    client -> host    TCP 130 [TCP segment of a reassembled PDU]
650 19.682068642    host -> client    TCP 130 56104 > ssh [PSH, ACK] Seq=3286 Ack=4070 Win=40064 Len=64 TSval=443799152 TSecr=443803606[Reassembly error, protocol TCP: New fragment past old data limits]