Ask Your Question
0

If a packet is sent but not received, can the problem not be the network?

asked 2021-02-01 18:12:57 +0000

slushi gravatar image

We are having an issue with slow responses to our application from our memcached servers. The client capture shows a TCP retransmission from our application to the memcached server. The capture on the server machine shows the original packet was never received. Filtering by the sequence number, we see something like this:

Client

Time                            Length          Info
10:09:41.496303                     66          39126 -> 11211 [ACK] Seq=7040425, Ack=122270281 Win=182272 Len=0
10:09:41.497324                    306          39126 -> 11211 [PSH, ACK] Seq=7040425, Ack=122270281 Win=182272 Len=240
10:09:41.697515                    306          [TCP Retransmission] 39111 -> 11211 [PSH, ACK] Seq=7040425 Ack=122270281

Server

Time                            Length          Info
10:09:41.511636                     66          39126 -> 11211 [ACK] Seq=7040425, Ack=122270281 Win=182272 Len=0
10:09:41.706877                    306          39126 -> 11211 [PSH, ACK] Seq=7040425 Ack=122270281 Win=182272 Len=240

The puzzling thing is netstat shows no packet drop. Also if we run memaslap, an optimized load testing tool, with similar configuration we don't see any dropped packets and performance is consistently excellent.

Our ops team is saying it seems like the packet is being dropped on the client machine. I.e., the application is delivering the packet to the kernel, but the kernel drops the packet before sending it to the NIC. However, they don't know of a good way to confirm this. Is this a plausible explanation, and if so is there some way to confirm? I would think there would be some kind of logging that could be enabled to show this.

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
0

answered 2021-02-01 19:03:58 +0000

hugo.vanderkooij gravatar image

If I understand you correctly you captured packets on one system but not on the other. Then it stands to reason that you need to look at all components in that network. Be they physical or virtual.

For example see if you can see packet drops on the swith. See if you can look at packet drops on both servers on the relevant interface.

With stuff like that that you need to fill in the blanks. Every component my be a cause for this sort of trouble.

I'm afraid there are no easy answers here.

edit flag offensive delete link more

Comments

Yes, the confusing part is there are no packet drops on either server, and the problem doesn't happen with the load testing tool, only our application code. This leads the ops team to believe it's not a network problem. On the other hand, since the packet appeared in the client dump, it's not an application problem.

The part in the middle is the OS I suppose, so I was wondering if there's some way to debug things at that level.

slushi gravatar imageslushi ( 2021-02-01 20:17:38 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2021-02-01 18:12:57 +0000

Seen: 923 times

Last updated: Feb 01 '21