# If a packet is sent but not received, can the problem not be the network?

We are having an issue with slow responses to our application from our memcached servers. The client capture shows a TCP retransmission from our application to the memcached server. The capture on the server machine shows the original packet was never received. Filtering by the sequence number, we see something like this:

Client

Time                            Length          Info
10:09:41.496303                     66          39126 -> 11211 [ACK] Seq=7040425, Ack=122270281 Win=182272 Len=0
10:09:41.497324                    306          39126 -> 11211 [PSH, ACK] Seq=7040425, Ack=122270281 Win=182272 Len=240
10:09:41.697515                    306          [TCP Retransmission] 39111 -> 11211 [PSH, ACK] Seq=7040425 Ack=122270281


Server

Time                            Length          Info
10:09:41.511636                     66          39126 -> 11211 [ACK] Seq=7040425, Ack=122270281 Win=182272 Len=0
10:09:41.706877                    306          39126 -> 11211 [PSH, ACK] Seq=7040425 Ack=122270281 Win=182272 Len=240


The puzzling thing is netstat shows no packet drop. Also if we run memaslap, an optimized load testing tool, with similar configuration we don't see any dropped packets and performance is consistently excellent.

Our ops team is saying it seems like the packet is being dropped on the client machine. I.e., the application is delivering the packet to the kernel, but the kernel drops the packet before sending it to the NIC. However, they don't know of a good way to confirm this. Is this a plausible explanation, and if so is there some way to confirm? I would think there would be some kind of logging that could be enabled to show this.

edit retag close merge delete

Sort by » oldest newest most voted

If I understand you correctly you captured packets on one system but not on the other. Then it stands to reason that you need to look at all components in that network. Be they physical or virtual.

For example see if you can see packet drops on the swith. See if you can look at packet drops on both servers on the relevant interface.

With stuff like that that you need to fill in the blanks. Every component my be a cause for this sort of trouble.

I'm afraid there are no easy answers here.

more

Yes, the confusing part is there are no packet drops on either server, and the problem doesn't happen with the load testing tool, only our application code. This leads the ops team to believe it's not a network problem. On the other hand, since the packet appeared in the client dump, it's not an application problem.

The part in the middle is the OS I suppose, so I was wondering if there's some way to debug things at that level.

( 2021-02-01 20:17:38 +0000 )edit