Revision history [back]

First the lowhanging fruit, is the window size too small, not scaled enough? For this to calculate, you need the used windowsize and the roundtrip time. The window size on the receiving end is ~2MB, but the roundtrip time is between 0,144 and 7,5 ms. This is a result of the way the capture was made, that variation is not real. Even so, with these numbers a bandwidth per TCP stream could be reached of 2 (7,5ms RTT) and 40 (0,144 RTT) Gbit/s. As there are 16 parallel streams, you could easily fill up the 10 Gbit/s pipe. So Windowsizes and scaling are not the issue.

Then looking at the TCP bandwidth of a couple of those sessions, you can see that bandwidth is rising, then dropping and then rising, but not as high as it was before. This is typical for a congestion windows that is influenced by network conditions. Looking at the packets, there are cases of TCP fast retransmissions, indicating there might actually be some packet loss on this connection. This pcap makes it hard to tell, as there are indeed more cases of packets just being missing from in the capture.

Without a clean 100% acccurate packet capture, it is impossible to tell from the packets what the cause really is. Look at all the packets that have a 0 microsecond delta and all the missing packets. So if you need proper analysis, get a 10 Gbit/s TAP and capture solution.

As that might be an issue, start by looking at the port statistics of both servers and both switchports to see if there are any errors and/or discards. As I suspect a bit of packetloss tuning down the congestion window, limiting the throughput. If there are fiber connections involved, make sure they are cleaned properly, as that is a very common source of just a tiny bit of errors, lowering the congestion window.