Advice on consistent TCP Retransmissions

asked 2020-02-12 22:53:19 +0000

weathercoach gravatar image

updated 2020-02-12 22:54:02 +0000

Hello. I have a micro service running in AWS ECS (docker) which is querying an Active Directory LDAP server over an AWS Privatelink in another AWS account.

Recently I was looking at the traffic between the micro service and the LDAP server to understand some problems the micro service was reporting. Since then we have addressed the original problem with the micro service trying to use connections that had been torn down already. While capturing packets to address the original problem i noticed that every single connection from the micro service to the LDAP server resulted in some TCP Retransmissions and TCP Duplicate ACKs.

While I am under the hood so to speak i want to get to the bottom of this problem ... if in fact it is one. I have attached a redacted screen shot of a few connections from the micro service to the LDAP server. The players here are 172.17.0.4 - micro service (running in docker container) 10.52.37.72 - eth0 on the EC2 instance that the docker container is running on. 172.17.0.4 is on a bridge interface of 10.52.37.72. 10.52.37.24 - The IP we use to communicate with the LDAP server. The LDAP server is on the other side of 10.52.37.24 and it's controlled by a different org in my company.

What i'm not sure about here is if a problem actually exists here or if wireshark makes me think a problem exists because I am capturing the packets on the EC2 instance and those packets are going through 2 interfaces on the host. Does wireshark see the traffic from the docker bridge interface over the actual eth0 interface as duplicates and retransmissions?

If this packet capture does represent an actual problem i'm not sure where to go next. From some reading up on this it seems that the fact that the ip identification is the same and the packet TTL is decrementing would indicate a routing problem of some sort. Anyhoo i was hoping someone could tell me if this looks like an actual problem or not? The application in question is still suffering from some problems but based on what i am seeing on the wire i think the problems are related to the logic in the application and it not handling valid LDAP failures gracefully. Previously a true problem did exist that i could see on the wire but now i'm not sure if the current state of things has anything to do with the network itself.

TIA. G. The supporting image is here https://pasteboard.co/IUp2bof.png image description

edit retag flag offensive close merge delete

Comments

Which interface did you make the capture on?
What is output of tcpdump -D ?

Chuckc gravatar imageChuckc ( 2020-02-13 01:10:21 +0000 )edit

I used tcpdump -i any so it captures across all the interfaces. But the output of tcpdump -D is

sudo tcpdump -D 1.eth0 2.docker0 3.vethccbb0c1 4.nflog (Linux netfilter log (NFLOG) interface) 5.nfqueue (Linux netfilter queue (NFQUEUE) interface) 6.veth4d7157f 7.veth5ef4e4f 8.veth6fa9032 9.vethca8aa09 10.veth877ee70 11.any (Pseudo-device that captures on all interfaces) 12.lo I n the above example veth5ef4e4f is the virtual ethernet interface for the docker container which is associated with the physical bridge interface docker0. My understanding is traffic from the veth traverses docker0 and then eth0 as the default gateway for the host is accessible via the eth0 interface.

I think what i'll do is run a non docker client bound to eth0 and perform an interaction with the LDAP server and capture the packets to see if those have the same TCP Retransmissions and TCP Dup ACKs.

weathercoach gravatar imageweathercoach ( 2020-02-13 16:47:11 +0000 )edit

So i ran queries using the ldapsearch binary directly on the host (not in docker) and captured the traffic between eth0 and the ldap server in question. I do not see any TCP Retransmissions or TCP dup ACKs. So i'm thinking i can assume that the TCP Retransmissions and TCP dup ACKs were related to how i originally captured packets and the wireshark display. Does that seem accurate?

weathercoach gravatar imageweathercoach ( 2020-02-13 17:11:31 +0000 )edit

Yes. The capture with three copies of the packets makes sense when captuing all interfaces.
You can filter on frame.interface_id to see the traffic for a single interface.

Chuckc gravatar imageChuckc ( 2020-02-13 18:07:54 +0000 )edit