Ask Your Question
0

TCP Retransmissions

asked 2020-10-19 16:46:16 +0000

karl gravatar image

updated 2020-10-19 17:43:20 +0000

Update: Giving a non-https capture. This one was really bad. It was a single CURL to http://www.bom.gov.au/

No.     Time           Source                Destination           Protocol Length Info
      1 0.000000       32.23.109.22          23.54.57.70           TCP      66     24434 → 80 [SYN] Seq=0 Win=42340 Len=0 MSS=1460 SACK_PERM=1 WS=256
      2 0.000998       23.54.57.70           32.23.109.22          TCP      66     80 → 24434 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1 WS=128
      3 0.000051       32.23.109.22          23.54.57.70           TCP      54     24434 → 80 [ACK] Seq=1 Ack=1 Win=42496 Len=0
      4 0.000119       32.23.109.22          23.54.57.70           HTTP     132    GET / HTTP/1.1 
      5 0.000990       23.54.57.70           32.23.109.22          TCP      60     80 → 24434 [ACK] Seq=1 Ack=79 Win=29312 Len=0
      6 0.107525       23.54.57.70           32.23.109.22          HTTP     2974   [TCP Previous segment not captured] Continuation
      7 0.000030       32.23.109.22          23.54.57.70           TCP      66     [TCP Dup ACK 3#1] 24434 → 80 [ACK] Seq=79 Ack=1 Win=42496 Len=0 SLE=7301 SRE=10221
      8 0.004211       23.54.57.70           32.23.109.22          TCP      1514   [TCP Retransmission] 80 → 24434 [ACK] Seq=1 Ack=79 Win=29312 Len=1460
      9 0.000038       32.23.109.22          23.54.57.70           TCP      66     24434 → 80 [ACK] Seq=79 Ack=1461 Win=41216 Len=0 SLE=7301 SRE=10221
     10 12.953972      23.54.57.70           32.23.109.22          TCP      1262   [TCP Retransmission] 80 → 24434 [ACK] Seq=1461 Ack=79 Win=29312 Len=1208
     11 0.000041       32.23.109.22          23.54.57.70           TCP      66     24434 → 80 [ACK] Seq=79 Ack=2669 Win=40704 Len=0 SLE=7301 SRE=10221
     12 0.001026       23.54.57.70           32.23.109.22          HTTP     1262   [TCP Previous segment not captured] Continuation
     13 0.000000       23.54.57.70           32.23.109.22          HTTP     2470   [TCP Previous segment not captured] Continuation
     14 0.000032       32.23.109.22          23.54.57.70           TCP      74     [TCP Dup ACK 11#1] 24434 → 80 [ACK] Seq=79 Ack=2669 Win=40704 Len=0 SLE=30661 SRE=31869 SLE=7301 SRE=10221
     15 0.000011       32.23.109.22          23.54.57.70           TCP      82     [TCP Dup ACK 11#2] 24434 → 80 [ACK] Seq=79 Ack=2669 Win=40704 Len=0 SLE=33077 SRE=35493 SLE=30661 SRE=31869 SLE=7301 SRE=10221
     16 0.001006       23.54.57.70           32.23.109.22          TCP      306    [TCP Fast Retransmission] 80 → 24434 [ACK] Seq=2669 Ack=79 Win=29312 Len=252 [TCP segment of a reassembled PDU]
     17 0.000038       32.23.109.22          23.54.57.70           TCP      82     24434 → 80 [ACK] Seq=79 Ack=2921 Win=40704 Len=0 SLE=33077 SRE=35493 SLE=30661 SRE=31869 SLE=7301 SRE=10221
     18 0.205820 ...
(more)
edit retag flag offensive close merge delete

Comments

Looks good up till frame 1328.
Protocol column shows TLSv1.3
You'll want to look inside to see what the TLS setup is doing.
See this past question

Chuckc gravatar imageChuckc ( 2020-10-19 17:05:31 +0000 )edit

Are you making the capture on the client system? Any chance of capturing in the middle?
Frame 6 - TCP Previous segment not captured
Frame 6 - Length 2974 (NIC doing offload?)
Frame 7 - Dup ACK; SACK SLE=7301 SRE=10221
A clean capture showing what's happening on the wire would help.

Chuckc gravatar imageChuckc ( 2020-10-19 18:19:53 +0000 )edit

Could you please share us a trace file. https://blog.packet-foo.com/2016/11/t...

As analyzing on top of a screenshot is only in a few cases nice / possible.

Christian_R gravatar imageChristian_R ( 2020-10-19 21:05:42 +0000 )edit

Thanks, I really appreciate the help. I just took this one. It was really bad.

http://45.118.133.252/src.pcap

http://45.118.133.252/dst.pcap

The dst is a linode instance I put up, but the latency between the two is <1ms (the other server isn't at Linode, but they are both in Singapore - where everything tends to be close to everything else). The src, the problem one, is a relatively beefy server which isn't doing too much.

karl gravatar imagekarl ( 2020-10-20 02:17:30 +0000 )edit

dst.pcap was captured on the server?
Frame 6 - len=7354 - is split up and put on the wire by the NIC.

src.pcap is showing the smaller frames but some have been missed.
Frame 6 - len=1514 - previous segment not captured.

Other than continuing to work your way in from both ends, maybe look at items in @SYN-bit answer.

Chuckc gravatar imageChuckc ( 2020-10-20 14:54:41 +0000 )edit

1 Answer

Sort by » oldest newest most voted
0

answered 2020-10-20 07:02:52 +0000

SYN-bit gravatar image

So if I understand your problem correctly, when only serving incoming connections, the server is doing great. Once you start initiating outgoing connections pulling in data from the outside, you see a lot of TCP retransmissions?

If so, I suspect a duplex mismatch or maybe a dirty fiber that is causing packet loss only on the receiving side (from the servers perspective). Loss of ACK packets usually does not introduce many problems, so that could be the reason you were not experiencing any problems before.

How is this server connected? Do you have multiple servers in the same network segment? And if so, is this the only one having these problems?

edit flag offensive delete link more

Comments

To give an update, the issue was with our hosting provider. Though they still haven't provided an explanation, they did resolve it.

In case this helps anyone else, this was affecting other related servers, but it _seemed_ like it was affecting them less. However, I didn't want to mess around as much with those other servers, so I can't say for sure how they were behaving.

What finally triggered me to open a support ticket was the output from:

sudo mtr -4 --tcp --port 443 --report --report-cycles 10 google.com

This showed substantial but inconsistent latency within and at the border of our provider. The stddev was quite extreme (over 1sec with a typical value being in single-digit milliseconds). Frankly, I didn't even know that mtr could run in tcp-mode (normal mtr showed no issue, which is why I initially believed the problem was ours). From ...(more)

karl gravatar imagekarl ( 2020-10-22 04:47:07 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2020-10-19 16:46:16 +0000

Seen: 82 times

Last updated: Oct 20