Low throughput between vmWare hosts in vxlan topology - spurious retransmissions.

asked 2019-04-26 17:30:52 +0000

naskop
31 ●3 ●7

updated 2019-04-26 17:32:27 +0000

Got this weird case of low throughput between hosts. Seems to me, the problem starts when there are 3 outstanding packets in flight that are not full size. The receiver seems to acknowledge them first, then the rest of the packets. By that time the receiver has generated 3 duplicate ACK packets which triggers the sender re-transmit packets that were already acknowledged. Took a capture at both sides while running iperf. Throughput from A to B is low (around 1Gbps), B to A is ok. If we remove one leg of the port-channel on B side, throughput improves significantly. download packet captures

image description

edit retag flag offensive close merge delete

Comments

Have you ever tried to use the port-channel on B-side as an active-backup configuration instead of a load balancing configuration?

Christian_R ( 2019-04-27 21:37:33 +0000 )edit

BTW you are facing real packet loss here, too!

Christian_R ( 2019-04-27 21:38:08 +0000 )edit

Thanks Christian, we are not using active-standby port channels. Where do you see the packet loss? I see exactly the same number of sent and received packets at both sides:

naskop ( 2019-04-28 01:31:57 +0000 )edit

yes you are right no loss, I was missled by my wireshark. -> Different story

But then I think you should try an active-standby setup as the out of order arrivals are slowing down your session. Or you can post the trace, where you disconnected one leg, so I can proof my assumption.

Christian_R ( 2019-04-28 12:52:41 +0000 )edit

Captures with one of the port channel legs down.

naskop ( 2019-04-29 13:42:07 +0000 )edit

It seems odd that in certain times the receiver only acks packets that are not full length size (and out of order) even though other packets arrived before that. Looks like some sort of buffering to me, for example packet # 68737 (dup ack #3) on the receiver side, the sack field is: SACK: 2996108614-2996109446 2996103990-2996104270 2996058358-2996059102, which are not full size.Then in the next 6 packets the receiver acknowledges the previously received full size packets, but by that time it is too late the sender received 3 dup acks and re-sends the "missing" segment, marked as spurious transmission in Wireshark.

naskop ( 2019-05-01 16:27:01 +0000 )edit

That is what I meant. We some more findings in the trace. But I think the most significants are the retransmissions as they have direct impact to the throughput, because they refuces the senders transmit window

Christian_R ( 2019-05-01 18:06:49 +0000 )edit

This is a very interesting case study. The underlying problem is that your servers begin to "misbehave" once throughput (packets per round trip) increases to certain levels. At first, the odd events are benign, but as load increases they become "catastrophic", causing the unnecessary retransmissions and everything that goes along with that.

There are repeating patterns within repeating patterns. The main trigger is that the packets get even more out-of-order after we have seen them in the supposedly "receiver side" trace and before they really get to the receiver. Small packets always overtake larger ones but at a certain point, 15 full sized packets are overtaken by a small one plus at least 2 large ones.

This happens after point "4a" (where we just see the small one has already overtaken just one or two large ones) but before they subsequently arrive at the receiving server. This is evidenced by ...(more)

Philst ( 2019-05-02 00:25:50 +0000 )edit

@naskop If you can’t see the email address in the profile it is creusch[at]crnetpackets.com

Christian_R ( 2019-05-02 06:59:31 +0000 )edit

We setup monitor sessions on each switch. On the B side the 2 monitor sessions are fed into a packet broker. The captures I provided are during running iperf test on the vmware hypervisor. We also ran iperf on VMs runing on the hosts in question and I didn't see this behavior which leads me to think, the issue is with the TCP stack on the hypervisor.

naskop ( 2019-05-02 13:07:46 +0000 )edit

add a comment

answered 2019-04-29 19:17:27 +0000

Christian_R

2059 ●9 ●71 ●51 http://crnetpackets.com

updated 2019-04-29 19:19:57 +0000

Nice question: In the One leg trace: We cannot spot any retransmissions. Seems that the Out Of order symptom is slightly improved compared to the slow trace. So my recommendations:

Check if active-standby config for the VMware access could solve the issue
Find out if the root cause for out-of-order. Maybe it is normal.
Check if you really need the connection between the leaf switches

edit flag offensive delete link

Comments

Christian, we've ran in this configuration (active-active) for years and it will be a hard sell to have many 10 gig link sitting idle. I think the out of order behavior is normal due to the multi-path. The leaf switches run in a mlag configuration, the connection between them is a peer link for heartbeat.

naskop ( 2019-05-02 13:17:47 +0000 )edit

add a comment

Low throughput between vmWare hosts in vxlan topology - spurious retransmissions.

Comments

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

Low throughput between vmWare hosts in vxlan topology - spurious retransmissions. edit

Comments

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

Low throughput between vmWare hosts in vxlan topology - spurious retransmissions.