1 | initial version |
I don't think there's a bug with SACK behaviour.
It looks like a group of 12 packets got lost after the sniffer saw them – based on the SACKs that tell us that. About 250+ subsequent packets did get through, however, about 400 KB of data packets after that didn’t get through either.
The 12 lost packets (as reported by lots of SACKs) were eventually retransmitted one per RTT and a SACK acknowledges each one. These SACKs come without any delay - since TCP is in known loss recovery mode.
After those 12 are all done, all the 250+ are properly ACKed. From the receiver's viewpoint, we're no longer in loss recovery mode and so it appears that the "Delayed Ack" mechanism comes into play. Then the large number (~400KB) of lost packets are retransmitted one-at-a-time with a 100ms delay before the corresponding ACK is received for each one.
I think the story is pretty much correctly detailed in everyone's Ask WS responses. Except that there doesn’t seem to be anything wrong with the SACKs.
Questions I’d have are: - Why are bulk packets being lost downstream (first the 12, then the large group of about 400 KB). - Why did the sender wait about 11ms before retransmitting the first of the 12 "reported missing by SACKs" packets? - I think the "one retransmission per 100ms) is explained by the “Delayed ACK” mechanism once we got past the lost data recovery mode).
2 | No.2 Revision |
I don't think there's a bug with SACK behaviour.
It looks like a group of 12 packets got lost after the sniffer saw them – based on the SACKs that tell us that. About 250+ subsequent packets did get through, however, about 400 KB of data packets after that didn’t get through either.
The 12 lost packets (as reported by lots of SACKs) were eventually retransmitted one per RTT and a SACK acknowledges each one. These SACKs come without any delay - since TCP is in known loss recovery mode.
After those 12 are all done, all the 250+ are properly ACKed. From the receiver's viewpoint, we're no longer in loss recovery mode and so it appears that the "Delayed Ack" mechanism comes into play. Then the large number (~400KB) of lost packets are retransmitted one-at-a-time with a 100ms delay before the corresponding ACK is received for each one.
I think the story is pretty much correctly detailed in everyone's Ask WS responses. Except that there doesn’t seem to be anything wrong with the SACKs.
Questions I’d have are: - Why are bulk packets being lost downstream (first the 12, then the large group of about 400 KB). - Why did the sender wait about 11ms before retransmitting the first of the 12 "reported missing by SACKs" packets? - I think the "one retransmission per 100ms) is explained by the “Delayed ACK” mechanism once we got past the lost data recovery mode).
Edit: After more thinking - and looking at the newer, cleaner trace file - I see that I missed a particular behaviour that @SYN-bit discusses. That is, the number of SACKs (aka Dup-Acks) received by the sender imply that the receiver did receive more data packets (each Dup-Ack is triggered by receipt of a data packet) than it acknowledges in the SACK left and right edge fields.
The sender thereby treats too many "post actual loss" data packets as also lost - with subsequent unnecessary retransmissions.
This is quite interesting and difficult to explain.
3 | No.3 Revision |
Edit 2: It's interesting how "sleeping on it" can make things clearer in one's mind.
I now do think that there is a bug in the way the receiver handles SACKs - exactly as @SYN-bit pointed out right from the start.
The count of "Dup-Acks" (which are SACKs here) tells us how many data packets did get through to the receiver. In both traces, there were more SACKs than the SACK maximum RE included in the actual SACK LE/RE fields.
The conclusion is that the device sending the SACKs (bear in mind that it could be some intermediate device such as load balancer - instead of the NAS). The server TTLs don't change, so it probably isn't an ASA FW.
Looking at the charts included by @Chritian_R, those SACKs in the green rectangle should continue to slope upwards - like they do in the red rectangle above it.
The maker of the NAS (or whatever is sending these SACKs) should be encouraged to take a look at their TCP stack with a view to examining the possible bug.
That said, by far the main reason for the "dozens of seconds" delay in the file transfer is the very timid congestion avoidance/loss recovery algorithm used in the Mac. I'd guess that the problem isn't apparent for Windows clients because they now use a more aggressive algorithm such as Cubic.
Although this Mac behaviour is completely valid as far as TCP RFCs go. It would be great if Apple could be convinced to "update" their congestion avoidance algorithm - as Microsoft have done. Even a slight modification to send 2 retransmissions at a time (rather than 1) would improve this situation greatly (because all the 100ms "Delayed Acks" likely wouldn't happen.
Does anyone know if the Apple Mac has an option to modify the congestion avoidance algorithm?
Fixing the SACK bug would also at least reduce the number of delayed ack retransmissions and shave off a large number of the 100ms delays.
Lastly, the whole "slowness" thing is triggered by packet losses. The only "fix" that is actually within your power @TomLaBaude might be to see if you can find where packets are being lost and somehow reduce that (such as adding more buffer space in a router, or implementing shaping in a router that already has more memory).
It is definitely an interesting case study. Thanks to Tom for sharing it.
Original: I don't think there's a bug with SACK behaviour.
It looks like a group of 12 packets got lost after the sniffer saw them – based on the SACKs that tell us that. About 250+ subsequent packets did get through, however, about 400 KB of data packets after that didn’t get through either.
The 12 lost packets (as reported by lots of SACKs) were eventually retransmitted one per RTT and a SACK acknowledges each one. These SACKs come without any delay - since TCP is in known loss recovery mode.
After those 12 are all done, all the 250+ are properly ACKed. From the receiver's viewpoint, we're no longer in loss recovery mode and so it appears that the "Delayed Ack" mechanism comes into play. Then the large number (~400KB) of lost packets are retransmitted one-at-a-time with a 100ms delay before the corresponding ACK is received for each one.
I think the story is pretty much correctly detailed in everyone's Ask WS responses. Except that there doesn’t seem to be anything wrong with the SACKs.
Questions I’d have are: - Why are bulk packets being lost downstream (first the 12, then the large group of about 400 KB). - Why did the sender wait about 11ms before retransmitting the first of the 12 "reported missing by SACKs" packets? - I think the "one retransmission per 100ms) is explained by the “Delayed ACK” mechanism once we got past the lost data recovery mode).
Edit: After more thinking - and looking at the newer, cleaner trace file - I see that I missed a particular behaviour that @SYN-bit discusses. That is, the number of SACKs (aka Dup-Acks) received by the sender imply that the receiver did receive more data packets (each Dup-Ack is triggered by receipt of a data packet) than it acknowledges in the SACK left and right edge fields.
The sender thereby treats too many "post actual loss" data packets as also lost - with subsequent unnecessary retransmissions.
This is quite interesting and difficult to explain.
4 | No.4 Revision |
Edit 2: It's interesting how "sleeping on it" can make things clearer in one's mind.
I now do think that there is a bug in the way the receiver handles SACKs - exactly as @SYN-bit pointed out right from the start.
The count of "Dup-Acks" (which are SACKs here) tells us how many data packets did get through to the receiver. In both traces, there were more SACKs than the SACK maximum RE included in the actual SACK LE/RE fields.
The conclusion is that the device sending the SACKs (bear in mind that it could be some intermediate device such as load balancer - instead of the NAS). The server TTLs don't change, so NAS) reaches some sort of arbitrary SACK data limit and decides to stop adding to the SACK RE values, even though it probably isn't an ASA FW.has received more data.
Looking at the charts included by @Chritian_R, @Christian_R, those SACKs in the green rectangle should continue to slope upwards - like they do in the red rectangle above it.
The maker of the NAS (or whatever is sending these SACKs) should be encouraged to take a look at their TCP stack with a view to examining the possible bug.
That said, by far the main reason for the "dozens of seconds" delay in the file transfer is the very timid congestion avoidance/loss recovery algorithm used in the Mac. I'd guess that the problem isn't apparent for Windows clients because they now use a more aggressive algorithm such as Cubic.
Although this Mac behaviour is completely valid as far as TCP RFCs go. It would be great if Apple could be convinced to "update" their congestion avoidance algorithm - as Microsoft have done. Even a slight modification to send 2 retransmissions at a time (rather than 1) would improve this situation greatly (because all the 100ms "Delayed Acks" likely wouldn't happen.
Does anyone know if the Apple Mac has an option to modify the congestion avoidance algorithm?
Fixing the SACK bug would also at least reduce the number of delayed ack retransmissions and shave off a large number of the 100ms delays.
Lastly, the whole "slowness" thing is triggered by packet losses. The only "fix" that is actually within your power @TomLaBaude might be to see if you can find where packets are being lost and somehow reduce that (such as adding more buffer space in a router, or implementing shaping in a router that already has more memory).
It is definitely an interesting case study. Thanks to Tom for sharing it.
Original: I don't think there's a bug with SACK behaviour.
It looks like a group of 12 packets got lost after the sniffer saw them – based on the SACKs that tell us that. About 250+ subsequent packets did get through, however, about 400 KB of data packets after that didn’t get through either.
The 12 lost packets (as reported by lots of SACKs) were eventually retransmitted one per RTT and a SACK acknowledges each one. These SACKs come without any delay - since TCP is in known loss recovery mode.
After those 12 are all done, all the 250+ are properly ACKed. From the receiver's viewpoint, we're no longer in loss recovery mode and so it appears that the "Delayed Ack" mechanism comes into play. Then the large number (~400KB) of lost packets are retransmitted one-at-a-time with a 100ms delay before the corresponding ACK is received for each one.
I think the story is pretty much correctly detailed in everyone's Ask WS responses. Except that there doesn’t seem to be anything wrong with the SACKs.
Questions I’d have are: - Why are bulk packets being lost downstream (first the 12, then the large group of about 400 KB). - Why did the sender wait about 11ms before retransmitting the first of the 12 "reported missing by SACKs" packets? - I think the "one retransmission per 100ms) is explained by the “Delayed ACK” mechanism once we got past the lost data recovery mode).
Edit: After more thinking - and looking at the newer, cleaner trace file - I see that I missed a particular behaviour that @SYN-bit discusses. That is, the number of SACKs (aka Dup-Acks) received by the sender imply that the receiver did receive more data packets (each Dup-Ack is triggered by receipt of a data packet) than it acknowledges in the SACK left and right edge fields.
The sender thereby treats too many "post actual loss" data packets as also lost - with subsequent unnecessary retransmissions.
This is quite interesting and difficult to explain.