1 | initial version |
After a quick look at your "dest" and "src" PCAPs, I can see that the slow throughput is due to the sender going into congestion avoidance mode due to apparent, but not real, packet losses. Severe out-of-order events (such as one full-sized packet overtaking 9 other full-sized packets) are happening regularly.
For such an example, have a look at packet #53529 in the "dest" PCAP. Observe the next 9 data packets as well as the intervening SACKs.
When a packet arrives ahead of where it should be, the receiver sends SACKs implying that packets were missing. However, the missing packets arrive very quickly afterwards (sub millisecond).
When the OOO events are severe enough (very often) the SACKs make the sender believe that there were real packet losses and so it halves its transmit window and then ramps up slowly. Sometimes the sender actually retransmits data packets - causing the receiver to send D-SACKs (indicating that data was received twice).
When 3 transmit window "halvings" occur close together, we end up reducing the transmit window by a factor of 8. The common "increase the transmit window by just one extra packet per round trip" mechanism means that throughput is dramatically reduced.
So your problem is packets becoming OOO. Where and why would full-sized packets overtake several other full-sized packets in your network? I stress the "full-sized" because it is more common for very small packets to overtake big ones.
I'll add some more packet examples when I've had more time to look at the PCAPs.
2 | No.2 Revision |
After a quick look at your [Edit] Note: This answer applies to the newer "dest" and "src" PCAPs, PCAPs. The behaviour there is very, very different than in the original "slow" PCAP - where there are very real packet losses and retransmissions.
Thus, this answer shouldn't be compared to the other answers here.
The "slow" PCAP deserves its own separate answer.
In "src", I can see that the slow throughput is due to the sender going into congestion avoidance mode due to apparent, but not real, packet losses. Severe out-of-order events (such as one full-sized packet overtaking 9 other full-sized packets) are happening regularly.
For such an example, have a look at packet #53529 in the "dest" "src" PCAP. Observe the next 9 data packets as well as the intervening SACKs.
When a packet arrives ahead of where it should be, the receiver sends SACKs implying that packets were missing. However, the missing packets arrive very quickly afterwards (sub millisecond).
When the OOO events are severe enough (very often) the SACKs make the sender believe that there were real packet losses and so it halves its transmit window and then ramps up slowly. Sometimes the sender actually retransmits data packets - causing the receiver to send D-SACKs (indicating that data was received twice).
When 3 transmit window "halvings" occur close together, we end up reducing the transmit window by a factor of 8. The common "increase the transmit window by just one extra packet per round trip" mechanism means that throughput is dramatically reduced.
So your problem is packets becoming OOO. Where and why would full-sized packets overtake several other full-sized packets in your network? I stress the "full-sized" because it is more common for very small packets to overtake big ones.
I'll add some more packet examples when I've had more time to look at the PCAPs.