I have an issue whereby sender sends a bursty chunk of data across a WAN link of 50Mbps to receiver and some of the packets were lost in transit.

However the receiver, instead of sending DUP ACKs to sender for those packets which it did not receive (receiving instead those out of order packets with higher SEQ number), repeatedly sends many window_update packets, each time updating the receive window by 1 or 2 (window scaling is 12 ie. 4096 bytes).

This is despite the fact that the bytes in flight is much lower than the existing receive window (i.e. around 40kbytes in flight but 300*4096=1.2Mbytes of receive window.

The end result is that instead of getting multiple DUP ACKs from that receiver, therefore triggerring the sender to a fast retransmit, the sender in the end timed out based on RTO 200ms later to send the lost packet. This was ACK by receiver but the issue does not end here. The sender waited in binary backoff time ie. 400ms to send the next lost packet, and 800ms the third lost packet and so on. Throughput was slowed down tremendously.

What can I do to remediate this situation?

Questions:

1. is the receiver behaving correctly - I expected it to send DUP ACKs instead of window_update.
2. can the sender reset the RTO timer after getting the first ACK from the retransmission - I read that there are TCP New Reno partial acknowledgement that can speed up the recovery.
edit retag close merge delete

Does the connection have SACK enabled?

Edit: Added the missing and vital S to ACK.

( 2019-06-29 17:34:20 +0000 )edit
1

I think @grahamb means SACK :-)

Are you able to post a capture file on a public files sharing service like Dropbox, Onedrive, etc? If so, please do and share the link here (and please make sure there is no sensitive data in it, you can slice the packets after the TCP header for this analysis)

( 2019-06-29 18:19:49 +0000 )edit

SACK is not enabled. https://www.dropbox.com/sh/akzfjk2nb7... I have removed data field leaving the headers behind.

( 2019-06-29 19:11:14 +0000 )edit

Thx for the traces. Do you know the value of the Scaling factor the client and the receiver are advertising? Or even better could you provide us a trace which includes the 3 way handshake?

( 2019-06-30 11:42:32 +0000 )edit

I am still trying to get the 3-way TCP handshake which happens hours before the segment of the capture that points to the above issue (ie. application complaints of data loss in network after waiting for a long time (due to numerous RTO with binary backoff). However, from previous traces, the window scaling value of the receive was 12 indicating 4096 bytes window scaling factor (ie. 2^12). I don't think it is changed every session but let me confirm again once I get the 3-way handshake for the above tcpdump in a day or two.

( 2019-06-30 14:05:09 +0000 )edit

Sort by » oldest newest most voted

Interesting case that requires some RFC reading. I did not read all RFC's regarding the subject, but some reading result in the following:

RFC 5681 states in paragraph 2:

   DUPLICATE ACKNOWLEDGMENT: An acknowledgment is considered a
"duplicate" in the following algorithms when (a) the receiver of
the ACK has outstanding data, (b) the incoming acknowledgment
carries no data, (c) the SYN and FIN bits are both off, (d) the
acknowledgment number is equal to the greatest acknowledgment
received on the given connection (TCP.UNA from [RFC793]) and (e)
the advertised window in the incoming acknowledgment equals the
advertised window in the last incoming acknowledgment.


This means that the receiver is not sending duplicate ACK's when it is increasing it's window size.

I would say this is a less then optimal implementation on the side of the receiver and should be fixed.

RFC 2988 states in paragraph 5:

   Note that after retransmitting, once a new RTT measurement is
obtained (which can only happen when new data has been sent and
acknowledged), the computations outlined in section 2 are performed,
including the computation of RTO, which may result in "collapsing"
RTO back down after it has been subject to exponential backoff
(rule 5.5).


So I would expect the RTO timer to reset in between these retransmission so that the back-off mechanism only effects packets with the same sequence number. However since TCP segmentation offloading is in place, the behavior is now also dependant on the the implementation of the TSO feature. When I look at the packets, I think the TSO feature does not work very well with the OS TCP stack.

If you can do without TSO, I would suggest to try to test without TSO to see if the backoff mechanism is then only applied to packets with the same sequence number and not on all following packets.

more

Thanks for your comment. Really appreciate it. We have asked sysadmin from receiver side about their implementation of TCP stack. BTW, we found out today from the 3way handshake that both side use window scaling 12 (multiplier 4096).

Just a point to clarify, by TSO do you mean the way the sender sends a big chunk of packet like 13800 bytes and offload the duty to disassemble these into the right MTU size to the Network Card?

On the sender side, I was trying to read up a little more and I came across this section. That's why I was thinking would it help if the sender side turn on TCP NewReno:

16.3.1. NewReno One problem with fast recovery is that when multiple packets are dropped in a window of data, once one packet is recovered (i.e., successfully delivered and ACKed), a good ACK can be ...(more)

( 2019-07-01 15:33:00 +0000 )edit

Yes, by TSO I mean TCP segmentation offloading (I should have added the acronym within parentheses after writing it out in full) :-)

I'm not sure how a network card driver and the OS really work together in a TSO setup, but since the network card is the only one knowing which packets were really sent, I guess part of the TCP stack responsible for congestion avoidance and the likes must also run inside the network card or it's driver. Choosing a different TCP stack at the OS layer might not have that much difference if the NIC and its driver don't implement the same behavior. But as I said, I have no experience with how things work under the hood when TSO is enabled, so if someone knows, please do tell :-)

Turning TSO is worth a troubleshooting shot though in my opinion!

( 2019-07-01 22:34:24 +0000 )edit

@SYN-bit: I agreee. But I just wonder about a second spot. How long would you assume does the receiver keep the Out-Of-Order packets, if SACK is not used(like in in this case)? In older implementation I thought it was discarded immediately....

( 2019-07-02 19:49:17 +0000 )edit

@Christian_R I would say it depends... ... on the TCP implementation. But the receive buffer is reserved and can't be used for anything else, so I see no point in throwing away the data.

( 2019-07-02 19:53:52 +0000 )edit

@SYN-bit: Of course you are right. Thx. I was somehow on the wrong track...

( 2019-07-02 20:38:12 +0000 )edit