We are having an issue with slow responses to our application from our memcached servers. The client capture shows a TCP retransmission from our application to the memcached server. The capture on the server machine shows the original packet was never received. Filtering by the sequence number, we see something like this:
Client
Time Length Info
10:09:41.496303 66 39126 -> 11211 [ACK] Seq=7040425, Ack=122270281 Win=182272 Len=0
10:09:41.497324 306 39126 -> 11211 [PSH, ACK] Seq=7040425, Ack=122270281 Win=182272 Len=240
10:09:41.697515 306 [TCP Retransmission] 39111 -> 11211 [PSH, ACK] Seq=7040425 Ack=122270281
Server
Time Length Info
10:09:41.511636 66 39126 -> 11211 [ACK] Seq=7040425, Ack=122270281 Win=182272 Len=0
10:09:41.706877 306 39126 -> 11211 [PSH, ACK] Seq=7040425 Ack=122270281 Win=182272 Len=240
The puzzling thing is netstat
shows no packet drop. Also if we run memaslap, an optimized load testing tool, with similar configuration we don't see any dropped packets and performance is consistently excellent.
Our ops team is saying it seems like the packet is being dropped on the client machine. I.e., the application is delivering the packet to the kernel, but the kernel drops the packet before sending it to the NIC. However, they don't know of a good way to confirm this. Is this a plausible explanation, and if so is there some way to confirm? I would think there would be some kind of logging that could be enabled to show this.