Low throughput between vmWare hosts in vxlan topology - spurious retransmissions.
Got this weird case of low throughput between hosts. Seems to me, the problem starts when there are 3 outstanding packets in flight that are not full size. The receiver seems to acknowledge them first, then the rest of the packets. By that time the receiver has generated 3 duplicate ACK packets which triggers the sender re-transmit packets that were already acknowledged. Took a capture at both sides while running iperf. Throughput from A to B is low (around 1Gbps), B to A is ok. If we remove one leg of the port-channel on B side, throughput improves significantly. download packet captures
Have you ever tried to use the port-channel on B-side as an active-backup configuration instead of a load balancing configuration?
BTW you are facing real packet loss here, too!
Thanks Christian, we are not using active-standby port channels. Where do you see the packet loss? I see exactly the same number of sent and received packets at both sides:
yes you are right no loss, I was missled by my wireshark. -> Different story
But then I think you should try an active-standby setup as the out of order arrivals are slowing down your session. Or you can post the trace, where you disconnected one leg, so I can proof my assumption.
Captures with one of the port channel legs down.
It seems odd that in certain times the receiver only acks packets that are not full length size (and out of order) even though other packets arrived before that. Looks like some sort of buffering to me, for example packet # 68737 (dup ack #3) on the receiver side, the sack field is: SACK: 2996108614-2996109446 2996103990-2996104270 2996058358-2996059102, which are not full size.Then in the next 6 packets the receiver acknowledges the previously received full size packets, but by that time it is too late the sender received 3 dup acks and re-sends the "missing" segment, marked as spurious transmission in Wireshark.