This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Client ignoring re-transmission request

0

I have come across an issue of backups failing from a netbackup windows client. Blame the network kicks in. A lot of troubleshooting got down to the attached flow trace. Just wondering has anyone come across a similar situation where the client fails to send a packet and ignores the request for re-transmission. Seems like a software bug to me. The trace was done on the Cisco switch port connected to the client. near the top packet missing, servers asks for the missing packet Acknowledgment number: 1355609 , the client just keeps transmitting the next packet, total of 44 ACK's before the client closes the connection. Server Calculated window size: 14398, no scaling (-1) client got to Sequence number: 1639849 before resetting the connection

Looks like the client is ignoring TCP protocol standards, just wondered if anyone else has come across similar issue alt text

asked 08 Mar '17, 03:04

kirky755's gravatar image

kirky755
6112
accept rate: 0%

Can you post a capture of the full conversation, from SYN to RST, e.g. on Cloudshark? If you worry about sensitive details, sanitize your PCAP first:

https://blog.packet-foo.com/2016/11/the-wireshark-qa-trace-file-sharing-tutorial/

(08 Mar '17, 03:08) Jasper ♦♦

I can't attach the whole trace, was only capturing 96 bytes of each packets and still ended up with 30x40mb files. Here is a link to the last 1000 odd packets. the kit is on the same network, IP's randomised with TraceWrangler https://www.dropbox.com/s/o7mu80z0hrvs35v/ad-02_end_anon.pcapng?dl=0

(08 Mar '17, 03:27) kirky755

One Answer:

1

Interesting. It looks like there is only one packet lost right before packet 1148, and the duplicate ACKs are ingored. You can see the SACK option accurately tracking the remaining incoming packets up to packet 1387, so basically the receiver got everything except for 1 segment.

In 1394, the receiver tries again (after waiting close to 3 seconds, because no more packets came in it could react to) to signal the missing segment. And now it gets really strange: the ACK in packet 1395 to packet 1394 has the exact same (old!) sequence number 1357057 as packet 1148, even though the sender should have used 1640225, which would the the correct number. Then it sends an ACK-RST with the same (incorrect) sequence number again, tearing down the connection.

This is really weird TCP behavior. My guess is that there is some device between sender and receiver messing up the sequence numbers (and maybe not handling the SACK option correctly). What I would try to do is to get a simultaneous capture on both ends to compare what each node sends and receives. I bet there is some modification happening in between that you can spot that way.

answered 08 Mar '17, 04:23

Jasper's gravatar image

Jasper ♦♦
23.8k551284
accept rate: 18%

I agree, this is very unusual behaviour.

Just a couple more observations to add to the analysis:

Window Scaling is obviously in play, because we end up with 280+ KB of selectively ACKed client data.

Server packet #1394 carries SACK information - but also has 5 bytes of data payload. This is the only "data" that we see from the server.

Packet #1395 is a normal (200ms delayed) ACK from the client, acknowledging those 5 data bytes. This is the only evidence that the server's SACKs have been received at the client. Even if all the other server SACKs weren't received, we now know that at least this one was.

The client's Reset is around 1.5 secs after that ACK. Could the Reset have been triggered by the application, perhaps in response to those 5 bytes?

Can @kirky755 look at those 5 bytes and determine if they are an application layer error message? Or contain something that might provide a clue to all this?

The client/server TTLs are, respectively, 128/64. The minimum RTT is also just 0.1ms. These facts point to the client and server being very close and probably on the same subnet (despite the "wrangled" IP addresses).

Alternatively, could the "client" here be a middlebox as suggested by Jasper - and the real backup device is somewhere beyond that? If so, the middlebox would have to be terminating the TCP connection.

Regardless of all that, we do have a case where a receiver fails to retransmit a missing packet despite 43+1 SACKs (plus the last 5 byte data packet also containing SACK information).

I'm very keen to know what is in those 5 bytes!

(08 Mar '17, 19:04) Philst

Jasper/Philst

Thank you for your concise updates. The kit was connected to just Cisco L2 two switches so should not modify the packets. I have the other end trace so will take a look. You have given me so much information thank you, I'm always trying to learn more on tcp traces. Will report back anything I can find.

(09 Mar '17, 01:35) kirky755

Just an update, totally forgot window scale is in the SYN packets, the window looks to be just over 1Mb when trace fails so way more room in buffer before expected ACK. I couldn't get any info on that last 5 bytes the server sends. I looked at a couple of other failed traces but they didn't seem to have any data in the final ACK's so not sure if the server got a error code away on this time only. Backup team informed me the netbackup version is end of life and has no support now!! There is a plan to upgrade, but would have been nice to raise a support call on Vertias. May try get Unix team to switch off windows scaling on the Media server, having a smaller window see if there is a difference in operation on the missing packet issue.

In SYN's Client Window scale: 8 (multiply by 256) Kind: Window Scale (3) Length: 3 Shift count: 8 [Multiplier: 256]

Server Window scale: 7 (multiply by 128) Kind: Window Scale (3) Length: 3 Shift count: 7 [Multiplier: 128]

Thanks for all the info, I learnt a lot investigating this.

(13 Mar '17, 08:42) kirky755