# Can you explain this TCP sequence

Hello, Could anyone explain the behavior I observe below?

I have the following registration extract: a packet with seq=3020828 is sent out but never acknowledged. The packet is also retransmitted multiple times, but the receiver keeps acknowledging the previous seq:

 57788 2018-07-16 15:36:20.552618000 10.245.40.74 10.245.54.13 TCP 2974 64613 -> 14004 [ACK] Seq=3020828 Ack=73535403 Win=65536 Len=2920

...

58376 2018-07-16 15:36:20.851770000 10.245.40.74 10.245.54.13 TCP 1514 [TCP Retransmission]
64613 -> 14004 [ACK] Seq=3020828 Ack=74313583 Win=1296 Len=1460

58378 2018-07-16 15:36:21.101721000 10.245.54.13 10.245.40.74 TCP 1350 14004 -> 64613 [PSH, ACK] Seq=74313583 Ack=3020828 Win=4096 Len=1296 [TCP segment of a reassembled PDU]

...

60992 2018-07-16 15:36:22.652682000 10.245.40.74 10.245.54.13 TCP 1514 [TCP Retransmission] 64613 -> 14004 [ACK] Seq=3020828 Ack=77762103 Win=1296 Len=1460

60994 2018-07-16 15:36:22.658427000 10.245.54.13 10.245.40.74 TCP 1514 14004 -> 64613 [ACK] Seq=77762103 Ack=3020828 Win=4096 Len=1460 [TCP segment of a reassembled PDU]


On the receiver side, I do see the packet coming in, but it is someone ignored and it keeps acknowledging the previous seq:

13878 2018-07-16 15:36:20.825430000 10.245.40.74 10.245.54.13 TCP 1514 [TCP Retransmission] 64613 -> 14004 [ACK] Seq=3020828 Ack=74313583 Win=1296 Len=1460

18810 2018-07-16 15:36:36.047313000 10.245.54.13 10.245.40.74 TCP 29254 14004 -> 64613 [ACK] Seq=114189103 Ack=3020828 Win=20480 Len=29200 [TCP segment of a reassembled PDU]

...

13991 2018-07-16 15:36:21.425465000 10.245.40.74 10.245.54.13 TCP 1514 [TCP Retransmission] 64613 -> 14004 [ACK] Seq=3020828 Ack=74834803 Win=1296 Len=1460

13993 2018-07-16 15:36:21.433178000 10.245.54.13 10.245.40.74 TCP 27794 14004 -> 64613 [ACK] Seq=74834803 Ack=3020828 Win=4096 Len=27740 [TCP segment of a reassembled PDU]


According to the ACK packet, there is enough room in the recv window, but somehow the packet is ignored. After 5 unsuccessful retransmission, the sender eventually drops the connection.

Thanks.

edit retag close merge delete

Without a trace it is hard to say, camn you share us a trace: https://blog.packet-foo.com/2016/11/t.... And the receiver side is done with Segmentation Offloading which does not reflect the situation on the wire.

( 2018-07-18 11:01:20 +0000 )edit

The capture shows the connection established at 15:35:47 from port 64613 to 14004. The TCP conversation goes on for almost a minute, transferring amount of data, with occasions of congestion in both directions.

The issue seems to start at 15:36:20.526157000: from this time on, the receiver seems to ACK seq=3020828 for each of the 5 packet retransmissions, until the connection RESET, received at 15:36:39.431508000 (when the Windows sender gives up and closes the connection).

( 2018-07-18 12:25:26 +0000 )edit

The capture file appears to have been taken on the machine with IP 192.168.50.157. Is there a corresponding capture file available from the other side, namely at 192.168.195.45?

( 2018-07-18 18:43:12 +0000 )edit

Yes, cmaynard. Note the registrations were sanitized via TraceWrangler, so the IPs will appear randomized. I do have the corresponding registration from the connection peer: I will just upload and make it available tomorrow, when I can access the registration file again. Thanks!

( 2018-07-18 19:07:44 +0000 )edit

The problem is very strange. The missed 1460 byte segment w/seq # 3020828 is retransmitted with the next seq # correctly indicating 3022288; yet 192.168.195.45 continues to only ack 3020828. It would appear that 192.168.195.45 never receives it, but the capture file from the other side would confirm or deny that.

( 2018-07-18 19:17:31 +0000 )edit

Sort by » oldest newest most voted

Given the 2 supplied capture files ...

I used Tracewrangler to restore the original IP's, namely 192.168.195.45 => 10.247.166.16 and 192.168.50.157 => 172.28.12.164 to more easily compare them and to be able to refer to the same IP addresses in both capture files.

The captures were apparently taken with a snaplen of 54 bytes, which is a bit unfortunate as we don't have full frames for checksum verification. That said, I focused on a single packet, namely the 1st retransmission of the 1460 byte segment being sent from 10.247.166.16 to 172.28.12.164 with sequence # 3020828.

In the snd.pcapng file, this is frame #58376; in the recv.pcapng file, this is frame #13878. Comparing these 2 frames, the only differences are:

• TTL: It is 128 in snd.pcapng and 127 in recv.pcapng. This TTL difference was already noted by Packet_vlad.
• TCP Checksum: It is 0x3f4f (unverified) in snd.pcapng and 0x31fa (unverified) in recv.pcapng.

Unfortunately, the TCP checksums can't be verified because of the snaplen used. That said, the TCP Checksums should be identical since TTL isn't included as part of the 96-bit pseudo header that goes into the TCP Checksum algorithm. The TCP checksums would/could differ though if:

• There was some NAT'ing going on, in which case the IP addresses could have changed. Is this the case?
• TCP Checksum Offloading is being done on the sender side, in which case 0x3f4f might not be the actual TCP checksum that was calculated and transmitted. If this could be the case, then it would be better to capture outside of the 10.247.166.16 host.

It might be worthwhile to capture both sides again, but without applying a snaplen.

In any case, IF the TCP checksum is wrong when the TCP segment arrives at its destination, then this would explain why the receiver never ACK's it.

more

The checksum can be wrong, but it also can be due to offloading at least in the trace with the huge frames.

( 2018-07-19 18:01:58 +0000 )edit

I thought about that but how to explain such selectiveness - only original segment with sequence # 3020828 and all its subsequent retransmissions were dropped, while all of them had different checksums (because of different TCP ACK fields). At the same time none of pure ACKs from the same source was dropped.

( 2018-07-19 19:26:53 +0000 )edit

I guess, that the packets get stuck in the queue of the stack, as we can see that the stack is under stress due the advertised RWin of 4096. So I agree with Vladimir slowing down the traffic could be tried.

( 2018-07-19 20:43:40 +0000 )edit

Thanks, Maynard. This is a good hint - unfortunately, the original registrations are complete (possible that TraceWrangler truncates the payloads).

From Wireshark, I enabled checksum verification on the receiver, and the checksum are indeed correct, according to Wireshark.

Additional note: yes, there is Checksum Offloading enabled on both side, so checksum can only be verified on the registration side showing packet receive.

( 2018-07-20 08:15:08 +0000 )edit

Thanks, @Christian_R and @Packet_vlad. I see all the points about reducing/slowing down the traffic by tuning the TCP window scaling factor and the TCP buffer size. And they are indeed very pertinent. But both seems palliative actions: I would like to understand which is the root cause of the TCP behavior here and why the back pressure is not activated on the sender side.

( 2018-07-20 08:33:17 +0000 )edit

Hi Luca, I assumed reducing Window Scale factor as a measure to improve our observation ability. If you reduce WS factor, you'll be able to see what is actual (or at least more granular) RWIN size at the moment, not the one aligned to 4096-Bytes steps. And this would give us a possibility to do further analysis. Of course if it reveals us RWIN big enough to store a couple of MSS, the hypothesis is wrong and we'll search elsewhere.

( 2018-07-20 09:05:47 +0000 )edit

If we are right, and the systems are just under stress, then you can either slow down connections or making the systems itsself faster (checking and tuning CPU load, system load....and so on; It can be a complex task)

( 2018-07-20 11:02:52 +0000 )edit

Thanks guys. We will retry with the TCP scaling and window size changed, then. What remains unclear to me is why in such stress condition, the TCP machinery fails in triggering the (so much needed) congestion control.

( 2018-07-20 12:21:29 +0000 )edit

It is triggered. The congestion controls are in play:

1. We see a lot of Zero Windows

3. Congestion Avoidance comes into play with the Retransmission (#58376 for the SrcPort 64613 and # 91696 for the SrcPort 14004), as they are Retransmission of time (RTO) they will reduce the sending window dramatically but first at this point, but your RTT is small, too. So BDP might be still high.

( 2018-07-20 13:01:05 +0000 )edit

Luca, TCP assumes the sender will take care of: 1) not overloading the receiver; 2) not overloading the network. (2) is actually met, the sender stops sending further after packet was lost and no ACK came back. (1) - the sender keeps to retransmit the packet because it sees no Zero Window therefore it assumes the receiver has some room available. So TCP on the network works as needed.

The question is: what happened to the receiver so it ceases to clear its own receiving buffer and keeps sending minimal 4096 calculated R-window.

Check it out: 192.168.1.157 seems to get its buffer begin filling from packet No. 12801 down to No.13190, where it starts to report 1 (4096Bytes) Window Size and opens it only 14+ seconds later (to 5 (20480)Bytes, packet 17959). You can plot Windows Scaling graph and see this clearly.

All this time the ...(more)

( 2018-07-20 13:16:09 +0000 )edit

I understand the receiver application is temporarily busy and not consuming data from the receive window, while it is perfectly able to send data out. We know this could be normal for such kind of application at startup. However, either of the two must be true: 1) the receive buffer is eventually full: a Zero Window must be acknowledged. 2) the receive buffer is not full: the incoming retransmission packet must be (at least partially) accepted and acknowledged.

In the registration, each retransmission of len 1460 is not acknowledged, yet a receive window of 4096 is advertised. This is the only thing that still sounds inconsistent to me.

( 2018-07-20 13:43:54 +0000 )edit

The receiver application is a Java application, using java.nio and non-blocking sockets. An incoming request could easily take the application busy for a while, producing a significant amount of data out in result. So, no surprise its receive window shrinks from time to time -- but if it's full, it must go to 0 and not let the sender silently think the packet is lost.

( 2018-07-20 13:50:32 +0000 )edit