This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

TCP handshake question

0

HI

I am trying to troubleshoot some issues on a Linux Apache web server (very slow response several times during the day). For testing i am trying to load a page on my laptop (192.168.249.2) from the web server (172.18.26.41).

I used wireshark and noticed some TCP retransmissions and TCP DUP ACKs, see below, I ran the trace on both ends and saw the same results so no packets are going missing. I cant see anything out of sequence here and i don't understand why 26.41(web) is sending a TCP retransmission (22461) if its already received an ACK in packet 21453.

It only does this when there is a heavy load on the server, could this be an application (Apache) issue where port 80 becomes unavailable, could wireshark have seen packet 21453 come in but if apache is busy and port 80 becomes unavailable 26.41 issues a re transmission??

There doesn't appear to be any congestion on the network, we have 100mb link into the MPLS and a 1gb at the other end and the highest utilisation is approx 30%.

21432   0.000   192.168.249.2   172.18.26.41    TCP 55708 > http [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1 TSval=70329226 TSecr=0

21433 0.000 172.18.26.41 192.168.249.2 TCP http > 55708 [SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 SACK_PERM=1 TSval=33418600 TSecr=70329226

21453 0.000 192.168.249.2 172.18.26.41 TCP 55708 > http [ACK] Seq=1 Ack=1 Win=65160 Len=0 TSval=70329226 TSecr=33418600

22461 0.000 172.18.26.41 192.168.249.2 TCP [TCP Retransmission] http > 55708 [SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 SACK_PERM=1 TSval=33418876 TSecr=70329226

22470 0.000 192.168.249.2 172.18.26.41 TCP [TCP Dup ACK 21453#1] 55708 > http [ACK] Seq=1 Ack=1 Win=65160 Len=0 TSval=70329337 TSecr=33418600 SLE=0 SRE=1

24541 0.000 172.18.26.41 192.168.249.2 TCP [TCP Retransmission] http > 55708 [SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 SACK_PERM=1 TSval=33419377 TSecr=70329337

24546 0.000 192.168.249.2 172.18.26.41 TCP [TCP Dup ACK 21453#2] 55708 > http [ACK] Seq=1 Ack=1 Win=65160 Len=0 TSval=70329537 TSecr=33418600 SLE=0 SRE=1

26976 0.000 172.18.26.41 192.168.249.2 TCP [TCP Retransmission] http > 55708 [SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 SACK_PERM=1 TSval=33420378 TSecr=70329537

26979 0.000 192.168.249.2 172.18.26.41 TCP [TCP Dup ACK 21453#3] 55708 > http [ACK] Seq=1 Ack=1 Win=65160 Len=0 TSval=70329937 TSecr=33418600 SLE=0 SRE=1

32111 0.000 192.168.249.2 172.18.26.41 TCP 55708 > http [FIN, ACK] Seq=1 Ack=1 Win=65160 Len=0 TSval=70330565 TSecr=33418600

32458 0.000 192.168.249.2 172.18.26.41 TCP [TCP Retransmission] 55708 > http [FIN, ACK] Seq=1 Ack=1 Win=65160 Len=0 TSval=70330595 TSecr=33418600

33001 0.000 192.168.249.2 172.18.26.41 TCP [TCP Retransmission] 55708 > http [FIN, ACK] Seq=1 Ack=1 Win=65160 Len=0 TSval=70330655 TSecr=33418600

33008 0.000 172.18.26.41 192.168.249.2 TCP http > 55708 [ACK] Seq=1 Ack=2 Win=14480 Len=0 TSval=33422175 TSecr=70330655

Any help would be helpful.

asked 12 Jun ‘14, 03:18

navs_123456's gravatar image

navs_123456
1112
accept rate: 0%

edited 12 Jun ‘14, 03:20

grahamb's gravatar image

grahamb ♦
19.8k330206

alt text

(12 Jun ‘14, 03:20) navs_123456
1

It’s time consuming to do an analysis based on text exports and screen shots. Can you please post a pcap file somewhere, like Google drive, dropbox or cloudshark.org?

(12 Jun ‘14, 03:30) Kurt Knochner ♦

hi, try this https://dl.dropboxusercontent.com/u/48120370/172.18.26.41-DuringSlow-11am-spec.pcap , i have cleaned the file so the packet numbers have changed.

(12 Jun ‘14, 03:55) navs_123456


4 Answers:

1

After doing a googling found that it could be causing because of TCP_DEFER_ACCEPT option,Please check similar behaviour in this link, http://webcache.googleusercontent.com/search?q=cache:zoA_A0cao28J:https://labs.ripe.net/Members/gih/the-curious-case-of-the-crooked-tcp-handshake+&cd=9&hl=en&ct=clnk&gl=in

answered 14 Jun '14, 00:35

kishan%20pandey's gravatar image

kishan pandey
221282936
accept rate: 28%

Very good find!!

Thanks a lot for that link for several reasons

  • It shows a similar behavior, like the one described in the question.
  • it could explain what caused the problem (see my next answer)
  • I did not know about TCP_DEFER_ACCEPT and I have never seen that 'in the wild'. Cool!
  • I did not know RIPE labs. They have some interesting stuff on their site

BTW: How did you find that article? What did you search for?

Hint: I converted your comment to an answer, as it is an answer in itself. It does not explain what is causing the problem, but it might help to do so! Please read the FAQ, for how this site works.

(15 Jun '14, 03:06) Kurt Knochner ♦

Thanks Kurt for encouraging words for a Learner like me,I googled for "retransmit syn ack even after receiving ack" and it worked.

(15 Jun '14, 05:55) kishan pandey

cool. Keep going ;-)

(15 Jun '14, 06:18) Kurt Knochner ♦

1

Based on the answer of @kishan pandey (TCP_DEFER_ACCEPT), here is a new attempt to explain what could have happened:

TCP_DEFER_ACCEPT makes the server wait for a data packet from the client, which never arrives, so the SYN-ACK is sent again.

Now, the interesting question is: Why is there no data packet (HTTP GET, POST, HEAD) from the client?

A possible answer: Because it's not the client who ACKs the SYN-ACK, but 'something' between the client and the server. And now we should have a look at your Fortinet Firewall (concluded from the MAC address used in the capture file).

Maybe the Fortinet acts as a TCP or SYN proxy, because there is a content scanning feature enabled on that firewall and that's causing problems.

client  ----------  Fortinet ------------ server
    SYN ->               SYN -> 
                               <- SYN-ACK (wrong check sum !!)
                         ACK ->
                               <- SYN-ACK (wrong check sum !!)
    SYN -> (dropped)     ACK ->

Maybe the client sends a SYN and the firewall forwards that SYN to the server. The server answers with a SYN-ACK (wrong check sum in the capture file), which gets answered by the firewall (SYN proxy, TCP proxy) with an ACK. But then the firewall does not forward the SYN-ACK to the client and also drops any further SYN (re-transmit) from the client, for whatever reason (Hint: the SYN-ACK from the server shows a wrong check sum in the capture file - which could be true or caused by TCP offloading on the server).

As the client has no chance to send the data packet (never seen the SYN-ACK), the server will resend its SYN-ACK due to the TCP_DEFER_ACCEPT option.

So, the key to analyze this problem is a capture file taken at the client and the server in parallel, to check who is doing what.

@navs_123456: Can you please post both capture files (client and server), so we can check my new theory :-)

Regards
Kurt

answered 15 Jun ‘14, 03:24

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

edited 15 Jun ‘14, 03:32

0

Looks like the ACK for the SYN-ACK does not get through to the server, so the SYN-ACK is sent again and again.

You have created the capture file on (or near) the client. What do you see in the capture file on (or near) the server at the same time?

Regards
Kurt

answered 12 Jun '14, 05:19

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

edited 12 Jun '14, 07:10

Hi Kurt, this is the capture on the linix web server 172.18.26.41 (used tcpdump), this is what puzzles me, the server is actually seeing ACK for the SYN ACK, or at least wireshark can see it.

(12 Jun '14, 08:02) navs_123456

this is the capture on the linix web server 172.18.26.41 (used tcpdump),

Ah, you're right. It's on the server. I messed up the delta times of SYN-ACK and ACK ;-)

his is what puzzles me, the server is actually seeing ACK for the SYN ACK, or at least wireshark can see it.

Yes, that's 'strange'. Is there any firewall (iptables) on that server?

What is the output of the following commands?

iptables -L -nv

and

conntrack -L

Furthermore:

netstat -ns -w

Do you see an 'increasing' number of the following?

IP: 
    0 with invalid addresses
    0 incoming packets discarded
    0 dropped because of missing route

netstat -ns -t

Do you see an 'increasing' number of the following?

Tcp:
    0 failed connection attempts
    0 bad segments received.
(12 Jun '14, 08:18) Kurt Knochner ♦

BTW: One thing I found in the capture file. The TSval values of the SYN and ACK frames are identical although the time delta is ~4ms. I've never heard about a linux kernel feature that drops frames like these, but you'll never know.

What do you see in the capture files from the same client, while a connection works. Is the TSval value identical as well (maybe set/updated by your Fortinet).

(12 Jun '14, 08:38) Kurt Knochner ♦

One more thing, as I have seen similar things recently. How many entries do see in the output of the following command, while the system behaves like you described it?

netstat -nat

How many of them are in TIME_WAIT and how many in ESTABLISHED and SYN_SENT state

netstat -nat > netstat.txt
grep TIME_WAIT netstat.txt | wc -l
grep ESTABLISHED netstat.txt | wc -l
grep SYN_SENT netstat.txt | wc -l
wc -l netstat.txt

(12 Jun '14, 08:50) Kurt Knochner ♦

HI Kurt

Dont believe any firewall is enabled.

iptables -L -nv

Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination

conntrack -L doesnt appear to exist

netstat -ns -w this looks ok to me.

Ip:

17642648 total packets received
0 forwarded
0 incoming packets discarded
17640734 incoming packets delivered
15326802 requests sent out
4 dropped because of missing route

netstat -ns -t

Tcp: looks like a few issues here.

1073142 active connections openings
511389 passive connection openings
5100 failed connection attempts
55827 connection resets received
25 connections established
17545014 segments received
19464507 segments send out
437359 segments retransmited
705 bad segments received.
265095 resets sent
(13 Jun '14, 02:43) navs_123456

HI

out of the 900 entries in netstat -nat i had the following (this was during the busy period)

15 CLOSE_WAIT
266 ESTABLISHED
3 FIN_WAIT2
126 SYN_RECV
510 TIME_WAIT
(13 Jun '14, 04:48) navs_123456

Hi, also my comment about the TSval seems to have dissapered.

Basically it said that i ran the test twice during a time where the connection was working normally, both times the delta between the SYN and ACK was 3ms, once the TSval incemented by 1 and the second time it didnt increment.

(13 Jun '14, 04:56) navs_123456

O.K. let's sum it up:

There is no firewall on the server, so nothing that could block the (valid) ACks.

There is no problem with the number of sessions, regardless of their state, as even 510 connections in TIME_WAIT is nothing for a busy web server.

There are some 'interesting' TCP counters (like 5100 failed connection attempts), but the reason for those are unknown. Could be related to your problem.

You are capturing on the server, so the ACKs should get through to the OS, as Wireshark sees them as well.

There seems to be no problem with the TSval time stamps, as you have observed working connections, with the same TSval value for the SYN and the ACK frame (at least that's how I understand your comment).

Unfortunately I'm running out of ideas. You could try to enable debug logging in apache and check if you get any results. However, as long as the OS does not accept the 3-way-handshake ACK, I doubt that apache will ever get 'notified' about the new connection. So, to me it looks like a problem in the OS (kernel), but I don't have any explanation to offer, as everything looks good.

One last thing: You can try to disable TCP offloading, if it is enabled. And then check if things are getting better.

http://wiki.wireshark.org/CaptureSetup/Offloading
http://sandilands.info/sgordon/segmentation-offloading-with-wireshark-and-ethtool

(13 Jun '14, 06:30) Kurt Knochner ♦

Hi Kurt, yes you have understood all correctly.

I am wondering about the 126 SYN_RECV messages, suggests the OS hasnt seen an ACK for these SYNs.

I will try some more troubleshooting and try and marry up a new wireshark capture with netstat -nat command, to see if wireshark see's the ACKs come in from the client but the netstat -nat is still sitting in SYN_RECV. This i guess would confirm its some sort of OS issue.

Thanks for your help.

(13 Jun '14, 08:27) navs_123456

I am wondering about the 126 SYN_RECV messages, suggests the OS hasnt seen an ACK for these SYNs.

Yes, that could be another sign for your problem. As I cannot find any problem within the ACK frame (checksum O.K., etc.) I have no explanation to offer why the OS might drop that ACK frame.

This i guess would confirm its some sort of OS issue.

agreed. BTW: What happens if you restart apache while you see that many connections in SYN_RECV state?

BTW: did you check/disable TCP offloading?

(13 Jun '14, 09:18) Kurt Knochner ♦

One more thing...

Can you please run the following command and post the output for some of the SYN_RECV connections

ss -t -epim

(13 Jun '14, 09:22) Kurt Knochner ♦
showing 5 of 11 show 6 more comments

0

Maybe the TCP or IP checksumm is incorrect. The trace file is gone again so I couldn't check myself


Ok, found the trace by removing the '.' from the URL and the checksums are correct.
Could you check whether syncookies are enabled on your server and test by disabling it. cat /proc/sys/net/ipv4/tcp_syncookies
126 SYN-RECV half-open connections look a bit high to me

answered 12 Jun '14, 13:13

mrEEde's gravatar image

mrEEde
3.9k152270
accept rate: 20%

edited 13 Jun '14, 12:01

Isn't the idea of SYN cookies to have no queue? So, if SYN cookies are enabled, there should be no SYN_RECV at all, as the socket will be 're-created' only after a valid ACK has been received. At least that's how I understand it. I might be wrong...

(13 Jun '14, 13:47) Kurt Knochner ♦

from http://lwn.net/Articles/277146/ : " Due to this limitation, and the modest computation overhead of the cryptographic hash, the Linux stack only resorts to syncookie based connections when the number of half-open connection exceeds a high watermark controlled by the net.ipv4.tcp_max_syn_backlog sysctl." I was thinking that the syn_backlog might be sitting at 128 and only then the syn_cookie code would kick in. But ... I might be wrong ;-)

Should be easy enough though to try it out and feedback

(14 Jun '14, 05:51) mrEEde

I was thinking that the syn_backlog might be sitting at 128 and only then the syn_cookie code would kick in.

Good point regarding the 'late use' SYN cookies!

But then, if the number of half open connections is the trigger for using SYN cookies, how would disabling SYN cookies fix the problem that caused the half open connections? ;-))

(15 Jun '14, 03:03) Kurt Knochner ♦