Ask Your Question
1

Slow download from server

asked 2018-06-16 20:05:27 +0000

soochi gravatar image

Hello all,

Many clients download data from a server farm. Occasionally the download is slow. This server farm is located behind an F5 loadbalancer. The F5 does terminate the https connection from the client. It then starts its own https session to the real server.

2 traces are attached one is slow and the other is fast. My goal is to find out which device is the root cause of this issue. Trace is made via tcpdump at LB

https://drive.google.com/open?id=1Zrv...

https://drive.google.com/open?id=1buz...

I could see that the server does not send data to LB when the LB advertises a window size of an MSS or lower. I could measure around 6 seconds in the slow trace caused due to this behavior. So can I then conclude that this is the Server issue?

Why does the server not send data when the window size is <= 1 MSS? is it due to some congestion avoidance algorithm implemented at the server? Is there any known OS which behaves in this manner?

Please assist. Regards

edit retag flag offensive close merge delete

Comments

It looks like 10.41.196.49 is the client - correct? In the 13sec trace the iRTT in the TCP Handshake is 2msec, but in the 5sec trace, the iRTT is 407 microseconds - why is there such a huge difference?

To me it looks like the client is simply too slow to process incoming data, and that coupled with the rather small TCP Receive Window advertised by the client is the cause of the slow transfer rate. The small TCP Receive Window is also an issue in the 5sec trace and I'm sure increasing that would help, however it's also clear that something is not right on the client i.e. lack of resources. This can be seen by the high(er) delta times seen between consecutive ACKs from the client and the fact that it takes the client forever to increase its TCP Receive Window once it's filled.

NJL gravatar imageNJL ( 2018-06-17 18:46:56 +0000 )edit

Why does the server not send data when the window size is <= 1 MSS? is it due to some congestion avoidance algorithm implemented at the server?

No, probably this is because of "Silly window syndrome avoidance" mechanism (RFC1122 sec. 4.2.3.4)

Could you please also provide detailed info on IP addresses and connections seen in the traces?

Packet_vlad gravatar imagePacket_vlad ( 2018-06-18 08:36:11 +0000 )edit

Traces are made at f5 which offloads TCP to NIC.

in each trace there is 2 sessions. One is from the client to LB:1443 and the other is from LB to real server:443.

real client IP 192.168.244.129 LB Virtual IP 192.168.231.150 LB IP towards real server 10.41.196.49 Real server IP 192.168.40.79

soochi gravatar imagesoochi ( 2018-06-18 08:41:02 +0000 )edit

Have you ever tried to enable "Scaling Window" feature at the 10.41.196.49?

Christian_R gravatar imageChristian_R ( 2018-06-18 09:21:18 +0000 )edit

Ahh! Good point Christian_R. The loadbalancer is actually removing the Window Scale feature, because the real client (192.168.244.129) is announcing that option, but once it's crossed the LB, that is not present any longer. Window Scaling is however not announced by the server 192.168.40.79, so there it should be enabled and the LB should be configured to allow Window Scaling.

NJL gravatar imageNJL ( 2018-06-18 09:31:03 +0000 )edit

1 Answer

Sort by ยป oldest newest most voted
0

answered 2018-06-25 06:22:13 +0000

soochi gravatar image

Activating WS did not help the issue.

The issue is that the LB 192.168.231.150:1443 could not send data towards the client 192.168.244.129. This TCP session influences negatively the other TCP session between LB and real server (10.41.196.49 -> 192.168.40.79:443).

F5 confirmed this as a bug and proposed an update.

edit flag offensive delete link more

Comments

@soochi Could you share us a trace with the WS enabled and the current F5 version? And has the ReceiveWindow on the real server grown above 16k?

Christian_R gravatar imageChristian_R ( 2018-06-25 10:49:46 +0000 )edit

unfortunately I do not have a trace with WS activated. 12.1.3.5 affected bug ID697878 I did not notice that as the server is uploading data. if u mean the send buffer from the server, yes other clients could download the same 10MB file in less than a sec (LB not in path).

soochi gravatar imagesoochi ( 2018-06-25 14:12:51 +0000 )edit

Yes it seems to be the LB, which causes the problem by decreasing his window size.

Christian_R gravatar imageChristian_R ( 2018-06-25 16:11:57 +0000 )edit

Hi all,

I guess F5 could intentionally decrease its Rwindow size on server side to slow down the server because F5 itself couldn't keep up with upload to client speed or some other reasons (the bug Sooraj mentioned).

..and of course F5 shouldn't have advertised silly windows below 1 MSS.

Packet_vlad gravatar imagePacket_vlad ( 2018-06-25 17:53:49 +0000 )edit

Exactly Vlad.. yes and at some worse conditions the F5 sent 0-win and reopened only after 400ms.:(

But still the server not answering even when RWIN > 1 MSS is not normal.

soochi gravatar imagesoochi ( 2018-06-25 19:27:01 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2018-06-16 20:05:27 +0000

Seen: 1,663 times

Last updated: Jun 25 '18