Help with tcp previous segment not captured

asked 2024-07-04 15:03:29 +0000

pcap_file: https://www.cloudshark.org/captures/8...

I have an Esxi cluster(6.7) with 2 Esxi hosts (A & B) running, managed by a VCSA vcenter 7.0. Networking is managed by NSX V using 2 subnets. One subnet has internet access, and one does not. VMs on ESXi A work great; there are no issues. However, on ESXi B, VMs on a subnet that allows the internet cannot reach only some websites.

I can reach www.facebook.com but cannot reach www.google.com. The pcap confirms there are no DNS issues. The configuration is the same for both ESXi hosts. I am perplexed by why some websites work while others do not. "tcp previous segment not captured" Client Hello message to Google is not understood. Reading some blogs, it seems like a packet loss. But why does a packet to one website go through while others won't? Everything will be normal if I migrate this VM to ESI host A. Is ESXi host B doing something? The firewall setting and network configuration are identical on both ESXi hosts.

Perimeter firewall allows outbound 443 without any filtering.

Comments

Can you capture traffic on the WAN interface directly connected to the service provider? In the 3-way handshake process, Facebook and Google both employ smaller MSS values than anticipated, or I would an overlay. What's interesting is that packet 10 has a TCP length of 3242 and a valid checksum from Facebook. You need a stateful firewall or something similar to do that.

BigFatCat ( 2024-07-05 04:54:20 +0000 )edit

add a comment

answered 2024-07-05 12:14:33 +0000

SYN-bit

18600 ●9 ●361 ●255 https://SYN-b.it

The VXLAN encapsulation done by NSX-V adds an extra header to each packet, making full-size packets too large for ethernet networks with a normal MTU. Are you sure all interfaces that transport VXLAN encapsulated frames have their MTU size increased to (the advised) 1600?

Looking at the three sessions:

Facebook: The MSS is 1392, which apparently is low enough to not create segments that are oversized
Comvault: Used the default MSS of 1460, there is a 1460 byte segment that is not getting through, but in frame 48, the server tries a smaller (1024 byte) segment and that does get through. So the session is slow, but does work because of the server trying smaller segments.
Google: Uses an MSS of 1412 and there are 2800 TCP bytes missing, this is exactly 2x (1412 - 12). The -12 becaus for the extra TS options in each packet. So it seems the 1400 byte TCP segment with all the encapsulation ends up too big for the network card in host B.

Next steps for me would be:

Check the interface settings on the NICs of both HostA and HostB, pay attention to MTU and enabled offloading features and firmware/driver versions
Make a packet capture of the VM when running on HostA, capture both in the VM and on the NIC facing the external network
Do the seame for HostB and compare the 4 pcap files (and do please share again so we can have a look too :-))

edit flag offensive delete link

Comments

I changed the DVS mtu from 1500 to 9000 and Everything is back to normal. Tested with machines that have both VMXNET3 and E1000 cards. All looks good. I have no idea why everything suddenly broke. Thank you for the detailed analysis :)

d10deepu ( 2024-08-05 13:03:54 +0000 )edit

Glad to hear you were able to solve the issue!

SYN-bit ( 2024-08-05 15:03:56 +0000 )edit

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Help with tcp previous segment not captured

Comments

1 Answer

Comments

Your Answer

Question Tools

Stats

Help with tcp previous segment not captured edit

Comments

1 Answer

Comments

Your Answer

Question Tools

Stats

Help with tcp previous segment not captured