Ask Your Question
1

Slow download from server

asked 2018-06-16 20:05:27 +0000

soochi gravatar image

Hello all,

Many clients download data from a server farm. Occasionally the download is slow. This server farm is located behind an F5 loadbalancer. The F5 does terminate the https connection from the client. It then starts its own https session to the real server.

2 traces are attached one is slow and the other is fast. My goal is to find out which device is the root cause of this issue. Trace is made via tcpdump at LB

https://drive.google.com/open?id=1Zrv...

https://drive.google.com/open?id=1buz...

I could see that the server does not send data to LB when the LB advertises a window size of an MSS or lower. I could measure around 6 seconds in the slow trace caused due to this behavior. So can I then conclude that this is the Server issue?

Why does the server not send data when the window size is <= 1 MSS? is it due to some congestion avoidance algorithm implemented at the server? Is there any known OS which behaves in this manner?

Please assist. Regards

edit retag flag offensive close merge delete

Comments

It looks like 10.41.196.49 is the client - correct? In the 13sec trace the iRTT in the TCP Handshake is 2msec, but in the 5sec trace, the iRTT is 407 microseconds - why is there such a huge difference?

To me it looks like the client is simply too slow to process incoming data, and that coupled with the rather small TCP Receive Window advertised by the client is the cause of the slow transfer rate. The small TCP Receive Window is also an issue in the 5sec trace and I'm sure increasing that would help, however it's also clear that something is not right on the client i.e. lack of resources. This can be seen by the high(er) delta times seen between consecutive ACKs from the client and the fact that it takes the client forever to increase its TCP Receive Window once it's filled.

NJL gravatar imageNJL ( 2018-06-17 18:46:56 +0000 )edit

Why does the server not send data when the window size is <= 1 MSS? is it due to some congestion avoidance algorithm implemented at the server?

No, probably this is because of "Silly window syndrome avoidance" mechanism (RFC1122 sec. 4.2.3.4)

Could you please also provide detailed info on IP addresses and connections seen in the traces?

Packet_vlad gravatar imagePacket_vlad ( 2018-06-18 08:36:11 +0000 )edit

Traces are made at f5 which offloads TCP to NIC.

in each trace there is 2 sessions. One is from the client to LB:1443 and the other is from LB to real server:443.

real client IP 192.168.244.129 LB Virtual IP 192.168.231.150 LB IP towards real server 10.41.196.49 Real server IP 192.168.40.79

soochi gravatar imagesoochi ( 2018-06-18 08:41:02 +0000 )edit

Have you ever tried to enable "Scaling Window" feature at the 10.41.196.49?

Christian_R gravatar imageChristian_R ( 2018-06-18 09:21:18 +0000 )edit

Ahh! Good point Christian_R. The loadbalancer is actually removing the Window Scale feature, because the real client (192.168.244.129) is announcing that option, but once it's crossed the LB, that is not present any longer. Window Scaling is however not announced by the server 192.168.40.79, so there it should be enabled and the LB should be configured to allow Window Scaling.

NJL gravatar imageNJL ( 2018-06-18 09:31:03 +0000 )edit

1 Answer

Sort by » oldest newest most voted
0

answered 2018-06-25 06:22:13 +0000

soochi gravatar image

Activating WS did not help the issue.

The issue is that the LB 192.168.231.150:1443 could not send data towards the client 192.168.244.129. This TCP session influences negatively the other TCP session between LB and real server (10.41.196.49 -> 192.168.40.79:443).

F5 confirmed this as a bug and proposed an update.

edit flag offensive delete link more

Comments

@soochi Could you share us a trace with the WS enabled and the current F5 version? And has the ReceiveWindow on the real server grown above 16k?

Christian_R gravatar imageChristian_R ( 2018-06-25 10:49:46 +0000 )edit

unfortunately I do not have a trace with WS activated. 12.1.3.5 affected bug ID697878 I did not notice that as the server is uploading data. if u mean the send buffer from the server, yes other clients could download the same 10MB file in less than a sec (LB not in path).

soochi gravatar imagesoochi ( 2018-06-25 14:12:51 +0000 )edit

Yes it seems to be the LB, which causes the problem by decreasing his window size.

Christian_R gravatar imageChristian_R ( 2018-06-25 16:11:57 +0000 )edit

Hi all,

I guess F5 could intentionally decrease its Rwindow size on server side to slow down the server because F5 itself couldn't keep up with upload to client speed or some other reasons (the bug Sooraj mentioned).

..and of course F5 shouldn't have advertised silly windows below 1 MSS.

Packet_vlad gravatar imagePacket_vlad ( 2018-06-25 17:53:49 +0000 )edit

Exactly Vlad.. yes and at some worse conditions the F5 sent 0-win and reopened only after 400ms.:(

But still the server not answering even when RWIN > 1 MSS is not normal.

soochi gravatar imagesoochi ( 2018-06-25 19:27:01 +0000 )edit

oops... There is a bug in Wireshark 2.6.1 with the calculated window size when window scaling is not in use.

So sorry for not seeing the strange window

Christian_R gravatar imageChristian_R ( 2018-06-25 21:16:08 +0000 )edit

Yes in the slow trace packets 1442 and 1443 are showing a gap in transmission from the left side to the right side of the loadbalancer.

But I can´t see that the server is not answering...

Christian_R gravatar imageChristian_R ( 2018-06-25 21:41:12 +0000 )edit

Hey Christian, could you please tell me what bug are you talking about? Is it submitted on Bugzilla?

Packet_vlad gravatar imagePacket_vlad ( 2018-06-26 08:50:19 +0000 )edit

About "RWIN > 1 MSS" sending problem.

It's an interesting topic.

As I understand the server advertises 1460 Bytes MSS in SYN packet. At the same time most of the time client advances RWIN to 1448Bytes (payload size only!) Could this 1460 <-> 1448 Bytes mismatch have some meaning?

Christian, what is considered "MSS" in that case? 1460 in SYN or 1448 actually sent in every packet? Also, from what are we getting 1641 Length constantly?

We spend a lot of time sitting in "advertised 1448 RWIN" state (see picture) not sending anything.

Packet_vlad gravatar imagePacket_vlad ( 2018-06-26 09:38:49 +0000 )edit

One part of the problem is that we have a host trace. That´s why we see 4374 large frames, if only 1500 are allowed.

The other problem is, that the trace is truncated and for that it is hard to verify where this 1641 are comming from, as in the IP Header we can clearly see 1500 Byte.

BTW it is allowed to advertise windows smaller 1MSS if the Window is decreasing...

But the counter part should mostly not send any data in that case...

I think this 1448 might be right from the internal point of view. But in the trace we see, that it is ignored. And the Silly Window Algorithm states it clearly:

It is allowed to advertise windows smaller 1MSS if the Window is decreasing... But the counter part should mostly not send any data in that case

(Source is Stevens)

Christian_R gravatar imageChristian_R ( 2018-06-26 13:25:02 +0000 )edit

From Stevens:

"When operating as a receiver, small windows are not advertised. The receive algorithm specified by [RFC1122] is to not send a segment advertising a larger window than is currently being advertised (which can be 0) until the window can be increased by either one full-size segment (i.e., the receive MSS) or by one-half of the receiver’s buffer space, whichever is smaller."

From [RFC1122]:

"Note also that the receiver must use its own Eff.snd.MSS, assuming it is the same as the sender's."

In our case both MSS = 1448 Bytes (TCP header has 12 extra Bytes of TS and 2xNOPs). So in fact the receiver may advance its RWIN from 1448 and more. Of course it shouldn't do that if RWIN < MSS, but sometimes it does that.

The other part is: the server/sender has the same MSS of 1448. The server doesn ...(more)

Packet_vlad gravatar imagePacket_vlad ( 2018-06-26 14:31:31 +0000 )edit

1641 byte packets are caused due to f5ethtrailer (123 bytes)

Frame 150: 1641 bytes on wire (13128 bits), 1641 bytes captured (13128 bits) Encapsulation type: Ethernet (1) Arrival Time: Jun 11, 2018 15:08:14.088432000 Romance Daylight Time [Time shift for this packet: 0.000000000 seconds] Epoch Time: 1528722494.088432000 seconds [Time delta from previous captured frame: 0.000001000 seconds] [Time delta from previous displayed frame: 0.000001000 seconds] [Time since reference or first frame: 1.152888000 seconds] Frame Number: 150 Frame Length: 1641 bytes (13128 bits) Capture Length: 1641 bytes (13128 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: eth:ethertype:vlan:ethertype:ip:tcp:ssl:f5ethtrailer]

soochi gravatar imagesoochi ( 2018-06-27 07:26:17 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2018-06-16 20:05:27 +0000

Seen: 469 times

Last updated: Jun 25 '18