Low throughput between vmWare hosts in vxlan topology - spurious retransmissions.
Got this weird case of low throughput between hosts. Seems to me, the problem starts when there are 3 outstanding packets in flight that are not full size. The receiver seems to acknowledge them first, then the rest of the packets. By that time the receiver has generated 3 duplicate ACK packets which triggers the sender re-transmit packets that were already acknowledged. Took a capture at both sides while running iperf. Throughput from A to B is low (around 1Gbps), B to A is ok. If we remove one leg of the port-channel on B side, throughput improves significantly. download packet captures
Have you ever tried to use the port-channel on B-side as an active-backup configuration instead of a load balancing configuration?
BTW you are facing real packet loss here, too!
Thanks Christian, we are not using active-standby port channels. Where do you see the packet loss? I see exactly the same number of sent and received packets at both sides:
yes you are right no loss, I was missled by my wireshark. -> Different story
But then I think you should try an active-standby setup as the out of order arrivals are slowing down your session. Or you can post the trace, where you disconnected one leg, so I can proof my assumption.
Captures with one of the port channel legs down.
It seems odd that in certain times the receiver only acks packets that are not full length size (and out of order) even though other packets arrived before that. Looks like some sort of buffering to me, for example packet # 68737 (dup ack #3) on the receiver side, the sack field is: SACK: 2996108614-2996109446 2996103990-2996104270 2996058358-2996059102, which are not full size.Then in the next 6 packets the receiver acknowledges the previously received full size packets, but by that time it is too late the sender received 3 dup acks and re-sends the "missing" segment, marked as spurious transmission in Wireshark.
That is what I meant. We some more findings in the trace. But I think the most significants are the retransmissions as they have direct impact to the throughput, because they refuces the senders transmit window
This is a very interesting case study. The underlying problem is that your servers begin to "misbehave" once throughput (packets per round trip) increases to certain levels. At first, the odd events are benign, but as load increases they become "catastrophic", causing the unnecessary retransmissions and everything that goes along with that.
There are repeating patterns within repeating patterns. The main trigger is that the packets get even more out-of-order after we have seen them in the supposedly "receiver side" trace and before they really get to the receiver. Small packets always overtake larger ones but at a certain point, 15 full sized packets are overtaken by a small one plus at least 2 large ones.
This happens after point "4a" (where we just see the small one has already overtaken just one or two large ones) but before they subsequently arrive at the receiving server. This is evidenced by ...(more)
@naskop If you can’t see the email address in the profile it is creusch[at]crnetpackets.com
We setup monitor sessions on each switch. On the B side the 2 monitor sessions are fed into a packet broker. The captures I provided are during running iperf test on the vmware hypervisor. We also ran iperf on VMs runing on the hosts in question and I didn't see this behavior which leads me to think, the issue is with the TCP stack on the hypervisor.