Domain Client Restartup - Ultimate Challenge

asked 2018-01-25 23:24:45 +0000

TekNacion
1 ●1 ●2 ●1

I have problem where client computers connect to the domain just fine. I can ping devices by name and by IP address and I can access shares. But randomly (it could be hours or days) I will lose access to domain resources such as shares. I can still ping devices, though. To fix the problem I have enter the current IP address as an exclusion in order to force the client computer to request a new IP address. Once the new address is assigned everything is back to normal. Restarting the computer does not fix the problem. I figured that the problem is that the computer itself losses authentication with AD. If I delete the DNS entry for the bad IP address the entry is not added when the computer is restarted. But once the computer gets a new IP address the DNS entry is updated. Therefore, I decided that troubleshooting should be focused on before the user logs in.

So I started to gather Wireshark captures. Since the client computers are in separate subnets from that of the Domain Controller I placed a capture computer in each subnet (one for the client and one for the server subnets). I gathered semi-synchronized captures from the bad IP address and from the good IP address. Since I don't have enough points to upload files here is the link to the files

http://teknacion.com/wireshark

Thank you very much in advance. Let's see who is the first to help me solve this mystery.

edit retag flag offensive close merge delete

add a comment

2 Answers

Sort by » oldest newest most voted

answered 2018-01-27 08:29:23 +0000

mrEEde
4033 ●16 ●48 ●77

One obvious problem in both traces is that your MTU size of 1500 bytes is not available across the whole path.

The goodclient.pcapng filtered on 49169 shows PMTUD working from client towards the server and Cisco at 192.160.10.1 indicating the nexhop MTU size is 1446 The server however does not set the DF bit in the IP header and therefore doesn't learn about the available MTU size and reverts to the default MTU size of 536 after several retransmissions 2.4 seconds into the session.
tcp.port==49169 && (icmp||tcp.len>=536||tcp.flags&7) image description

The server side shows that PMTUD is not working correctly, obviously because the router didn't send an ICMP message back. The MSS is changed to 536 bytes and the DF bot is turned on in a last effort to tge the data to the client. Note that this is the 'good' case image description

In the bad case the MSS is staying at 1460 bytes and the data never reaches the client resulting in the server resetting the connection with a tcp.seq sitting at the first segment tcp.seq<1462 and tcp.flags&4 image description

So the problem is that the Cisco at the server side does not send ICMP messages when fragmentation is required. Ideally they should adjust the MSS when the SYN packets flow to avoid the need for the ICMP message in the first place.

Whether or not this solves all your problems I cannot tell but I would start here...

Regards Matthias

edit flag offensive delete link

Comments

Thank you for your response. Actually, both routers are the exact same model (RV130W) purchased at the same time and configured identically. The only available setting for MTU is in WAN section and the only options are Auto or Manual. Both are set to Auto.

So are you suggesting that the router on the server end is defective? Or perhaps I should change the MTU setting to Manual? If so, what should the manual MTU setting be?

Thanks,

TekNacion ( 2018-01-27 10:34:39 +0000 )edit

Setting an MTU size of 1446 at the server should bypass this problem .

mrEEde ( 2018-01-27 14:18:40 +0000 )edit

Hello. Thanks again for the help. I obviously do not specialize in network traffic analysis. This network is a VPN network with the server router being the center of a Site-to-Site setup with 5 other VPN routers. Different ISP between the connections. Should I just set the MTU to the same value on all routers instead of setting it to Auto? I believe the MTU value to use will the depend on the ISP with the lowest MTU value, correct?

TekNacion ( 2018-01-27 17:51:15 +0000 )edit

Hello. I just performed another double-sided capture and determined that the router on the server side is indeed sending the ICMP packet with Next Hop MTU data. The reason they did not up in the trace file I submitted is because of the capture filter applied. The capture filter on the server side was set to capture traffic only for the client computer so the traffic from the router were not in the capture file.

http://teknacion.com/wireshark/captur...

The problem is that the server is not making the adjustment when the router sends the ICMP and keeps sending the packet at the established MSS during the handshake. I don't know where to make the configuration to make the server respond to the ICMP message.

Do you think that setting the MTU to 1500 on all the routers fix this?

Thanks

TekNacion ( 2018-01-27 22:39:37 +0000 )edit

add a comment

answered 2018-01-28 07:11:16 +0000

TekNacion
1 ●1 ●2 ●1

Again, thank you very much for your response. It did point me in the right direction. I believe I have solved the issue. For sake of documentation I will write about my solution.

During the IP handshake both endpoints send each other their MSS, which is the lesser of the receive buffer or the result of MTU - 40. Both endpoints will send packets to each other at size of the smaller MSS. This calculation is based on NIC's MTU. So, if the NIC's MTU is 1500 the resulting MSS is 1460.

In my set up, for some reason PMTUD is reporting the WAN MTU (MTU of next hop) for the router as 54 bytes less than the configured MTU. So, if the WAN MTU is 1500 than the MTU of the next hop is reported as 1446. When the server sent a packet of 1460 it became 1500 after adding the header and footer to the packet. This new size is greater than the MTU of the next hop and required fragmentation.

The problem I was experiencing was that the server would not fragment when informed of the need to fragment from the ICMP message until it tried sending the packet 3 times. The fourth time it would sometimes fragment to 576. When it did not fragment the connection would be reset by the server because the client would never receive the expected packet, causing the computer to not complete the authentication conversation.

Changing the MTU of the NIC required the following two commands at the CMD prompt:

netsh interface ipv4 show subinterface (this command will display the MTU of the NIC's)
netsh interface ipv4 set subinterface “Local Area Connection” mtu=1446 store=persistent (this command will change the MTU of the NIC)

This is a very crude explanation due to the fact that I'm running out of available characters. But the take away is that the NIC's MTU has to be less than the Router's MTU in order to allow for VPN headers and avoid fragmentation.

edit flag offensive delete link

Comments

Right, this is basically what happened. And you solution is one way to bypass the problem. It requires that each host needs to be reconfigured to adjust to the bottleneck's MTU size - which could potentially change if you choose a route through a different ISP. The better solution would be to use Cisco's TCP adjust MSS funtion.
The MSS should be 40 bytes less than the MTU size defined in the tunnel

ip mtu 1446 ip tcp adjust-mss 1404 See https://learningnetwork.cisco.com/thread/122530

mrEEde ( 2018-01-28 07:42:28 +0000 )edit

The problem is that these are Cisco RV130W and don't have Cisco IOS access and no way to adjust the tunnel's MTU only the WAN MTU. Unless, there is some SSH hack. The only access is the web portal to the routers.

TekNacion ( 2018-01-28 07:49:17 +0000 )edit

add a comment