Ask Your Question
0

Excessive TCP Dup ACK and TCP Retransmissions

asked 2018-12-05 16:45:01 +0000

anonymous user

Anonymous

I'm doing an SFTP transfer between two servers about 70ms RTT apart and seeing excessive TCP Dup ACK and TCP Retransmissions. The circuit size is 50 mbit/sec, but I'm getting a transfer speed of 500 kbit/sec or less. What could be causing this?

Receiver https://www.cloudshark.org/captures/e...

Sender https://www.cloudshark.org/captures/e...

edit retag flag offensive close merge delete

Comments

Could you please enable SACK on both endpoints and do the capture again? An absence of SACK option makes loss recovery very inefficient.

What is the "sender" capture location? It has a bit strange IP TTL of 60. Is it several hops away from the endpoint or just non-usual TTL?

Packet_vlad gravatar imagePacket_vlad ( 2018-12-05 18:11:53 +0000 )edit

The sender is an AIX server which is why the TTL is unusual and starts at 60. The receiver is a Linux server with SACK already enabled. I will check the setting on the AIX sender.

Based on what you're seeing so far, what do you think is the most likely cause?

neteng.ams gravatar imageneteng.ams ( 2018-12-06 00:55:41 +0000 )edit

I need to take a closer look on it but actually it looks like micro-bursting with 1Gbit interface speed hitting a buffer or policer so strongly so it is causing bulk packet loss. At the same time recovery process is extremely slow because of SACK absence.

Packet_vlad gravatar imagePacket_vlad ( 2018-12-06 09:17:24 +0000 )edit

The 3-way handshake in the sender capture tells us that the AIX sender doesn't support SACK. It's possible that SACKs may not help in this case - but as @Packet_vlad suggests, SACK is more efficient in general and so you should enable it if you can.

The MSS=1380 in the server's SYN-ACK is a strong clue that there's a Cisco ASA firewall in the path.

The minimum RTT of 68.1 ms means that the client and server are relatively far apart.

Thanks for this very interesting capture. There are a couple of elements to the problem.

Philst gravatar imagePhilst ( 2018-12-07 03:11:13 +0000 )edit

1 Answer

Sort by ยป oldest newest most voted
3

answered 2018-12-07 05:34:35 +0000

updated 2018-12-11 07:58:08 +0000

There's a very consistent regularity to the way the packets flow from the client to the server. A particular pattern is repeated again and again - at roughly 7 second intervals - with about 500,000 KB transferred per interval. I'll define the large burst of packets as the start of the pattern.

Here are my key observations (with some supporting TCP Trace charts below):

1) The client sends a large burst of 250 KB, but large portions are lost after the first 100 KB. In the first TCP-Trace chart below, we see the 100 KB successful burst, the yellow area with no packets, a few subsequent packets that made it through, then one RTT later the second 100 KB burst.

2) The server's receive window is close to 1 MB, but the sender appears to use its own RWIN of 261,288 bytes as its own transmit "limit". The sender manages to maintain close to this "in flight" value throughout the whole period.

3) One RTT later, the sender receives ACKs for the 100 KB that wasn't lost and manages to transmit a further 100 KB without any errors. The large number of original lost packets trigger many Dup-ACKs and in response, the sender retransmits a single packet to begin to fill the gap. Following the horizontal "Ack line" on the chart we see the single retransmitted packet and the step up of the Ack line.

4) One RTT after that, there's another single packet retransmission.

5) The large initial gap in the received data is then filled in at the rate of just one packet per RTT. Also, after several RTTs (perhaps as the sending congestion window is opened), the sender begins to send small bursts of new data so that the in-flight value of 250 KB is maintained.

6) On the second chart below, we've zoomed-out to encompass a full pattern and the start of the next one. The dark blue circle is around the initial two large bursts, the red circle is around all the single packet retransmissions and the light blue circle is around all the small bursts of new packets. It looks like the sender eventually waits for every sixth round trip so that it can send a full 6-packet application "block" (there's a Push flag at the end of these 6-packet bursts).

7) Eventually, the original large gap has been completely filled-in and we see the Ack line jump all the way up as the original two large bursts and all the smaller "new" bursts are fully acknowledged. At this point, in-flight data is zero and the sender is now free to begin the whole pattern all over again.

TCP Trace - Initial Burst

TCP Trace - Full Pattern

So, what are the things that need to investigated further?

A) The bulk packet loss, always after a large burst of 100 KB, points to a device in the path that only has a 100 KB buffer space. The most likely candidate will be the router where the path ... (more)

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

3 followers

Stats

Asked: 2018-12-05 16:45:01 +0000

Seen: 20,569 times

Last updated: Dec 11 '18