Low throughput between vmWare hosts in vxlan topology - spurious retransmissions.

asked 2019-04-26 17:30:52 +0000

naskop
31 ●3 ●7

updated 2019-04-26 17:32:27 +0000

Got this weird case of low throughput between hosts. Seems to me, the problem starts when there are 3 outstanding packets in flight that are not full size. The receiver seems to acknowledge them first, then the rest of the packets. By that time the receiver has generated 3 duplicate ACK packets which triggers the sender re-transmit packets that were already acknowledged. Took a capture at both sides while running iperf. Throughput from A to B is low (around 1Gbps), B to A is ok. If we remove one leg of the port-channel on B side, throughput improves significantly. download packet captures

image description

edit retag flag offensive close merge delete

Comments

Have you ever tried to use the port-channel on B-side as an active-backup configuration instead of a load balancing configuration?

Christian_R ( 2019-04-27 21:37:33 +0000 )edit

BTW you are facing real packet loss here, too!

Christian_R ( 2019-04-27 21:38:08 +0000 )edit

Thanks Christian, we are not using active-standby port channels. Where do you see the packet loss? I see exactly the same number of sent and received packets at both sides:

naskop ( 2019-04-28 01:31:57 +0000 )edit

yes you are right no loss, I was missled by my wireshark. -> Different story

But then I think you should try an active-standby setup as the out of order arrivals are slowing down your session. Or you can post the trace, where you disconnected one leg, so I can proof my assumption.

Christian_R ( 2019-04-28 12:52:41 +0000 )edit

Captures with one of the port channel legs down.

naskop ( 2019-04-29 13:42:07 +0000 )edit

It seems odd that in certain times the receiver only acks packets that are not full length size (and out of order) even though other packets arrived before that. Looks like some sort of buffering to me, for example packet # 68737 (dup ack #3) on the receiver side, the sack field is: SACK: 2996108614-2996109446 2996103990-2996104270 2996058358-2996059102, which are not full size.Then in the next 6 packets the receiver acknowledges the previously received full size packets, but by that time it is too late the sender received 3 dup acks and re-sends the "missing" segment, marked as spurious transmission in Wireshark.

naskop ( 2019-05-01 16:27:01 +0000 )edit

see more comments

answered 2019-04-29 19:17:27 +0000

Christian_R

2059 ●11 ●74 ●51 http://crnetpackets.com

updated 2019-04-29 19:19:57 +0000

Nice question: In the One leg trace: We cannot spot any retransmissions. Seems that the Out Of order symptom is slightly improved compared to the slow trace. So my recommendations:

Check if active-standby config for the VMware access could solve the issue
Find out if the root cause for out-of-order. Maybe it is normal.
Check if you really need the connection between the leaf switches

edit flag offensive delete link

Comments

Christian, we've ran in this configuration (active-active) for years and it will be a hard sell to have many 10 gig link sitting idle. I think the out of order behavior is normal due to the multi-path. The leaf switches run in a mlag configuration, the connection between them is a peer link for heartbeat.

naskop ( 2019-05-02 13:17:47 +0000 )edit

add a comment

Low throughput between vmWare hosts in vxlan topology - spurious retransmissions.

Comments

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

Low throughput between vmWare hosts in vxlan topology - spurious retransmissions. edit

Comments

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

Low throughput between vmWare hosts in vxlan topology - spurious retransmissions.