Ask Your Question
2

Low throughput between vmWare hosts in vxlan topology - spurious retransmissions.

asked 2019-04-26 17:30:52 +0000

naskop gravatar image

updated 2019-04-26 17:32:27 +0000

Got this weird case of low throughput between hosts. Seems to me, the problem starts when there are 3 outstanding packets in flight that are not full size. The receiver seems to acknowledge them first, then the rest of the packets. By that time the receiver has generated 3 duplicate ACK packets which triggers the sender re-transmit packets that were already acknowledged. Took a capture at both sides while running iperf. Throughput from A to B is low (around 1Gbps), B to A is ok. If we remove one leg of the port-channel on B side, throughput improves significantly. download packet captures

image description

edit retag flag offensive close merge delete

Comments

Have you ever tried to use the port-channel on B-side as an active-backup configuration instead of a load balancing configuration?

Christian_R gravatar imageChristian_R ( 2019-04-27 21:37:33 +0000 )edit

BTW you are facing real packet loss here, too!

Christian_R gravatar imageChristian_R ( 2019-04-27 21:38:08 +0000 )edit

Thanks Christian, we are not using active-standby port channels. Where do you see the packet loss? I see exactly the same number of sent and received packets at both sides:

naskop gravatar imagenaskop ( 2019-04-28 01:31:57 +0000 )edit

yes you are right no loss, I was missled by my wireshark. -> Different story

But then I think you should try an active-standby setup as the out of order arrivals are slowing down your session. Or you can post the trace, where you disconnected one leg, so I can proof my assumption.

Christian_R gravatar imageChristian_R ( 2019-04-28 12:52:41 +0000 )edit

Captures with one of the port channel legs down.

naskop gravatar imagenaskop ( 2019-04-29 13:42:07 +0000 )edit

It seems odd that in certain times the receiver only acks packets that are not full length size (and out of order) even though other packets arrived before that. Looks like some sort of buffering to me, for example packet # 68737 (dup ack #3) on the receiver side, the sack field is: SACK: 2996108614-2996109446 2996103990-2996104270 2996058358-2996059102, which are not full size.Then in the next 6 packets the receiver acknowledges the previously received full size packets, but by that time it is too late the sender received 3 dup acks and re-sends the "missing" segment, marked as spurious transmission in Wireshark.

naskop gravatar imagenaskop ( 2019-05-01 16:27:01 +0000 )edit

1 Answer

Sort by ยป oldest newest most voted
1

answered 2019-04-29 19:17:27 +0000

Christian_R gravatar image

updated 2019-04-29 19:19:57 +0000

Nice question: In the One leg trace: We cannot spot any retransmissions. Seems that the Out Of order symptom is slightly improved compared to the slow trace. So my recommendations:

  • Check if active-standby config for the VMware access could solve the issue
  • Find out if the root cause for out-of-order. Maybe it is normal.
  • Check if you really need the connection between the leaf switches
edit flag offensive delete link more

Comments

Christian, we've ran in this configuration (active-active) for years and it will be a hard sell to have many 10 gig link sitting idle. I think the out of order behavior is normal due to the multi-path. The leaf switches run in a mlag configuration, the connection between them is a peer link for heartbeat.

naskop gravatar imagenaskop ( 2019-05-02 13:17:47 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

4 followers

Stats

Asked: 2019-04-26 17:30:52 +0000

Seen: 631 times

Last updated: Apr 29 '19