This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Unusual Behavior with Stacked VLAN Tags and Capture Filter

0
3

I have a strange issue that I have attempted to research but would welcome input from anyone else that may have encountered anything similar.

Specifics: VM Ubuntu 13 server on ESXi 5.0 with 3 NICs (type VMXNET3). (Note: Today I just rebuilt the VM using the new Ubuntu 14 LTS.) Eth0 is for the local network. Eth1 and Eth2 are dedicated to port mirrors from switches/taps. Eth1 only has single VLAN tags in the mirrored traffic. Eth2 has what I consider stacked tags. (I have included an image below.)

When I attempt to use a capture filter for VLAN 992 (993, 994, etc), I do not capture any data. I can use a display filter to show the VLAN 992 after it has been captured but this isn't what I desire for troubleshooting purposes due to the high amount of traffic.

I can capture filter the second VLANs (811, 812, etc) just fine.

I'm not sure where the issue resides.

I have experimented with other E1000 NICs in ESXi. I have the virtual switch set properly in ESX for promisc. (If this was set incorrectly I wouldn't have any traffic in the captures.) I have the VM NICs set for promisc (even though they appear not to need it since ESX is handling it). The version of Wireshark is 1.10.6 from the apt repository and whatever pcap is included - I mention this to confirm it's not an issue I created with a custom build/install of source.

Although I don't think it is worth going into a great amount of detail, I did install a CentOS 6.5 VM today as well and did perform a custom build/install of pcap and wireshark. Same NIC parameters. When I did this, I was no longer able to see the outer VLAN tag. I'm not sure if that is giving me a clue or not.

I know someone smarter than me could likely shed some light on my mystery.

Thanks for any assistance!

alt text

asked 17 Apr '14, 15:26

stjaru's gravatar image

stjaru
1222
accept rate: 0%

edited 17 Apr '14, 15:28

There are answers below about how to filter for the inner tag, but when I read this question, it sounds more to me like you're unable to filter the outer tag, 992 in this case. To quote:

When I attempt to use a capture filter for VLAN 992 (993, 994, etc), I do not capture any data. I can use a display filter to show the VLAN 992 after it has been captured but this isn't what I desire for troubleshooting purposes due to the high amount of traffic.

So when you apply a capture filter of, "vlan 992" you don't capture anything? But if you don't apply any capture filter, then you can later apply a Wireshark display filter of "vlan.id == 992" to filter the packets of interest? Is that right?

(20 Apr '14, 10:17) cmaynard ♦♦

There are answers below about how to filter for the inner tag,

well, my answer is actually primarily about the problem of filtering several outer tags in one capture filter statement.

(20 Apr '14, 14:46) Kurt Knochner ♦

Thank you Guy for participating in the thread. Good stuff!

Kurt, wow man, that was fantastic information! I appreciate you taking the time to give such excellent detail and sourcing! I'm very appreciative of everyone taking some time to help on this topic.

Now that I better understand why my other assumption was incorrect, would anyone have any ideas about why I cannot capture successfully on just the outer tag? That would be the heart of the issue.

I'm still unable to understand why I cannot use capture with "vlan 99x" and see packets (which is the answer to cmaynard's request for clarification/confirmation). Again, I cannot capture using that filter, but I can display to see the VLAN.

Thanks all,

(20 Apr '14, 20:17) stjaru

2 Answers:

0

Have you tried filtering on both VLAN ids at the same time? Something like "vlan 992 and vlan 811"? If I remember correctly this is required when filtering on QinQ traffic.

answered 17 Apr '14, 15:31

Jasper's gravatar image

Jasper ♦♦
23.8k551284
accept rate: 18%

Thanks for your input Jasper. :)

I have tried that without success.

Capture filter "vlan 992 or vlan 811" will not collect anything.

But I did discover something interesting - Capture filter "vlan 810 or vlan 811" will only collect the first VLAN (810). I would not expect a problem with that capture filter.

However since I have never been able to capture filter on the outer tag, I normally capture everything on the interface and then use a display filter on the outer tag. Hence the discovery of the behavior.

I don't know if these problems are related but I'm still wrestling with the original question.

(17 Apr '14, 15:49) stjaru

I would not expect a problem with that capture filter.

I would, but that's because I know the rather kludgy way that the "vlan" capture filter works. "vlan" turns everything to the right of it into a test for traffic under that VLAN, so "vlan 810 or vlan 811" doesn't do what you'd expect.

(18 Apr '14, 13:57) Guy Harris ♦♦

0

Capture filter "vlan 992 or vlan 811" will not collect anything.
Capture filter "vlan 810 or vlan 811" will only collect the first VLAN (810).

as @Guy Harris already mentioned the vlan capture filter 'primitive' does some magic behind the curtains, and thus it does not work as you might expect it, based on the behavior of other logical OR operations in capture filters.

See man page of pcap-filter:

vlan [vlan_id]
   Note that the first vlan keyword encountered in expression changes 
   the decoding offsets for the remainder of expression on the 
   assumption that the packet is a VLAN packet.

The vlan [vlan_id] expression may be used more than once, to filter on VLAN hierarchies. Each use of that expression increments the filter offsets by 4.

So, lets have a look at the BPF code for the following capture filter

tcpdump -ni eth0 -d ‘vlan 100’

(000) ldh      [12]        
(001) jeq #0x8100 jt 2 jf 6 (002) ldh [14] (003) and #0xfff (004) jeq #0x64 jt 5 jf 6 (005) ret #65535 (006) ret #0

(000): Load the location of the ethertype.
(001): Is it a VLAN tag (0x8100)?
(002-004): If true: load the value at position 14 (the VLAN tag) and compare it with 0x64 (100 decimal)

So far, so good.

Now lets check the BPF code of the following capture filter

tcpdump -ni eth0 -d 'vlan 100 or vlan 200'

(000) ldh      [12]
(001) jeq      #0x8100          jt 2    jf 5
(002) ldh      [14]
(003) and      #0xfff
(004) jeq      #0x64            jt 10   jf 5

(005) ldh [16] <<=== PROBLEM HERE !!!

(006) jeq #0x8100 jt 7 jf 11 (007) ldh [18] (008) and #0xfff (009) jeq #0xc8 jt 10 jf 11 (010) ret #65535 (011) ret #0

(000-004): same as before
(005): The whole problem occurs here. As the vlan primitive increases the decoding offset by 4 (see man page above - regardless of logical operator), the second vlan primitive will simply look at the wrong place for the ethertype. It should look at position 12 in the ethernet frame, but due to the decoding offset increase of 4, it looks at position 16, which is apparently wrong for a logical OR vlan operation (at least as one would assume/expect how it should work).

Due to this behavior (call is a bug or not), you cannot capture for several vlan tags in a single capture filter, combined with a logical OR operation. Furthermore, if you use a logical AND operation, you will only see double tagged or QinQ frames (as @Jasper) mentioned.

Solution: Run several instances of tcpdump, each with a single vlan capture filter and later merge the capture files with mergecap.

tcpdump -ni eth0 -w /tmp/vlan811.pcap ‘vlan 811’&
tcpdump -ni eth0 -w /tmp/vlan800.pcap ‘vlan 800’&
tcpdump -ni eth0 -w /tmp/vlan900.pcap ‘vlan 900’&
sleep 500
killall tcpdump
mergecap -w /tmp/vlan800+811+900.pcap /tmp/vlan800.pcap /tmp/vlan811.pcap /tmp/vlan900.pcap

+++ UPDATE +++

I have to correct myself. It is possible to capture multiple (outer) vlan tags in a single capture filter, by doing the vlan tag matching 'manually'.

tcpdump -ni eth0 'vlan and (ether[14:2]&0xfff=100 or ether[14:2]&0xfff=200)'

If you look at the BPF code, you'll see that it is essentially the same as 'vlan 100' combined with 'vlan 200', which is essentially the same as 'vlan 100 or vlan 200'.

(000) ldh      [12]
(001) jeq      #0x8100          jt 2    jf 7
(002) ldh      [14]
(003) and      #0xfff
(004) jeq      #0x64            jt 6    jf 5
(005) jeq      #0xc8            jt 6    jf 7
(006) ret      #65535
(007) ret      #0

(000): Load the location of the ethertype.
(001): Is it a VLAN tag (0x8100)?
(002): If true: load the value at position 14 (the vlan tag)
(004): compare the vlan tag with 0x64 (100 decimal)
(005): compare the vlan tag with 0xc8 (200 decimal)

Regards
Kurt

answered 19 Apr '14, 16:45

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

edited 20 Apr '14, 14:48

I'm still unable to understand why I cannot use capture with "vlan 99x" and see packets

O.K. we need more information. Here are some questions for you:

  • How did you capture the traffic (tcpdump, dumpcap, tshark, Wireshark)
  • Is your capturing system configured to use VLAN tags on the capturing interface?

You say:

When I attempt to use a capture filter for VLAN 992 (993, 994, etc) - OUTER tag - , I do not capture any data.
I can capture filter the second VLANs (811, 812, etc) - INNER tag - just fine.

O.K. so, either there are no frames with VLAN tag 99x, or something on your system strip the outer VLAN tag before the capturing system gets the frame, which would explain, why you do see the inner tag.

However, if the outer tag would have been removed, it does not explain why you see the outer tag in the capture file, with a display filter. But maybe I misunderstand what you did in that case!?!

So, can you please add more details about your capturing setup: What do you see in the capture file, if you

  • don't use any capture filter
  • use a capture filter for the outer tag: vlan 99x
  • use a combined capture filter for the outer tag: vlan 99x or vlan 99y
  • use a capture filter for the inner tag: vlan 88x
  • use a combined capture filter for the inner tag: vlan 88x or vlan 88y
  • use a capture filter as shown in my answer: vlan and ether[14:2]&0xfff=99x

BTW: Can you provide a sample capture somewhere (google docs, dropbox, cloudshark.org)?

(21 Apr '14, 03:27) Kurt Knochner ♦