1 | initial version |
Hi Thomas, what are you doing, posting a screenshot??? and making me read RFC1323 all over again??? ;-)
My understanding of RFC1323 is that each packet gets a TSval and that the TSval of the original packet must be used in the retransmission as well. This makes it possible to detect whether a segment is a retransmission or a new segment after wrapped sequence numbers.
In this case, the retransmission of frame 142 (frame 2357) uses a new TSval. The client apparently did not save the original TSval (499841231) of the segment and supplied a new TSval (499841282) to the retransmission (the increase of the TSval suggests a TSclock with 4ms ticks). To the server, this is a new segment after wrapping sequence numbers, so it sends a DUP-ACK to indicate that it missed data. As this is the first DUP-ACK, no fast retransmit will be done and the RTO timer keeps ticking until it finaly sends a retransmission due to RTO (frame 11903).
So in this case I would say it is a bug in the client's TCP stack, not implementing RFC1323 correctly. What is the client's OS? Are you able to disable RFC1323 options for analysis's Sake (pun intended)? If so, I expect to see 200ms RTO's on the client which will be ingnored by the server as the data has already been ACKed. So no difference in your performance loss, but at least some validation of my theory :-)
Next to solving the packet loss issues, you might want to sit down with the Application developer and the DBadmin to increase the fetch-size as it looks like there is now a "one-row-response-at-a-time" policy. Since there is one packet to send, fast-retranmits will never be send so each loss results in an RTO of 1.4 sec. When data is tranmitted more in a stream (due to larger responses), DUP-ACK's will trigger FastRetransmits which will make the effect of packet-loss smaller. And as a bonus, the application might be performant on the WAN too!