tshark - SSID name has UTF-8 replacement character?

asked 2019-11-15 17:37:16 +0000

flariut
1 ●1 ●1 ●2

updated 2019-11-21 15:11:12 +0000

grahamb

flag of United Kingdom of Great Britain and Northern Ireland

23850 ●4 ●995 ●227 https://www.wireshark.org

Hi there, I'm having a problem with this command in tshark, trying to get 802.11 probe requests in monitor mode:

tshark -o nameres.mac_name:FALSE -l -I -i wlan0mon -Y "wlan.ssid != 0" "wlan type mgt subtype 0100" > ./tshark_output

Everything is going well, except for some lines, that go like this:

 7067 2122.734629754 e0:98:61:xx:xx:xx → ff:ff:ff:ff:ff:ff 802.11 144 Probe Request, SN=615, FN=0, Flags=........C, SSID=administraci\357\277\275

that SSID should read (probably) as "administración", and as you can see, tshark is replacing that "ó" with the UTF-8 replacement character...

reading the tshark docs, I've found this:

TShark uses UTF-8 to represent strings internally. In some cases the output might not be valid. For example, a dissector might generate invalid UTF-8 character sequences. Programs reading TShark output should expect UTF-8 and be prepared for invalid output.

Is there anything I can do to solve this? any other flag? or I should just deal with the fact that tshark can't handle accent marks at all?

EDIT: Also, can somebody enlight me on what the "SN" and "FN" columns really mean?

EDIT2: I'm on Linux Mint 19.2, bash 4.4.20. I also run the command from Python's subprocess.Popen with shell=FALSE flag, giving the same results.

edit retag flag offensive close merge delete

Comments

What OS and what shell are you running this on?

grahamb ( 2019-11-15 18:36:02 +0000 )edit

I'm on Linux Mint 19.2, bash 4.4.20. I also run the command from Python's subprocess.Popen with shell=FALSE flag, giving the same results.

flariut ( 2019-11-15 18:41:40 +0000 )edit

OK, what's your locale, i.e. the output of locale?

grahamb ( 2019-11-15 19:11:07 +0000 )edit

output:

LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=es_AR.UTF-8
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=es_AR.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=es_AR.UTF-8
LC_NAME=es_AR.UTF-8
LC_ADDRESS=es_AR.UTF-8
LC_TELEPHONE=es_AR.UTF-8
LC_MEASUREMENT=es_AR.UTF-8
LC_IDENTIFICATION=es_AR.UTF-8
LC_ALL=

flariut ( 2019-11-15 19:41:21 +0000 )edit

See the discussion here on this subject.

Jaap ( 2019-11-16 14:13:45 +0000 )edit

see more comments

Comments

I created an access point with the SSID "administración" and indeed I also get the same behavior (Wireshark/Tshark 3.0.5 on MacOS). Here is the probe request packet:

0000   00 00 19 00 6f 08 00 00 dc f4 d1 97 00 00 00 00   ....o...........
0010   12 0c 3c 14 40 01 cd a1 01 40 00 00 00 ff ff ff   ..<.@....@......
0020   ff ff ff 6c 8d c1 2e 82 fa ff ff ff ff ff ff b0   ...l............
0030   fd 00 0f 61 64 6d 69 6e 69 73 74 72 61 63 69 c3   ...administraci.
0040   b3 6e 01 08 0c 12 18 24 30 48 60 6c 2d 1a 6f 00   .n.....$0H`l-.o.
0050   17 ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0060   00 00 00 00 00 00 00 00 7f ...

(more)

SYN-bit ( 2019-11-16 12:56:43 +0000 )edit

IEEE Std 802.11-2016 says, in section 9.4.2.2 "SSID element":

When the UTF-8 SSID subfield of the Extended Capabilities element is equal to 1 in the frame that includes the SSID element, or the Extended Capabilities of the source of the SSID information is known to include the UTF-8 SSID capability based on a previously received Extended Capabilities element, the SSID is interpreted using UTF-8 encoding. Otherwise, the character encoding of the octets in this SSID element is unspecified.

So if the UTF-8 SSID subfield of the Extended Capabilities element is not equal to 1 in the frame that includes the SSID element, and we didn't see a previously received Extended Capabilities element indicating that the sender has that capability, we'd need to use a heuristic to determine whether the SSID is UTF-8 or not.

We already have a way of specifying a "maybe ASCII ...(more)

Guy Harris ( 2019-11-17 08:33:46 +0000 )edit

Done: Bug 16208

SYN-bit ( 2019-11-17 09:22:08 +0000 )edit

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

tshark - SSID name has UTF-8 replacement character?

Comments

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

tshark - SSID name has UTF-8 replacement character? edit

Comments

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

tshark - SSID name has UTF-8 replacement character?