Ask Your Question
0

tshark - SSID name has UTF-8 replacement character?

asked 2019-11-15 17:37:16 +0000

flariut gravatar image

updated 2019-11-21 15:11:12 +0000

grahamb gravatar image

Hi there, I'm having a problem with this command in tshark, trying to get 802.11 probe requests in monitor mode:

tshark -o nameres.mac_name:FALSE -l -I -i wlan0mon -Y "wlan.ssid != 0" "wlan type mgt subtype 0100" > ./tshark_output

Everything is going well, except for some lines, that go like this:

 7067 2122.734629754 e0:98:61:xx:xx:xx → ff:ff:ff:ff:ff:ff 802.11 144 Probe Request, SN=615, FN=0, Flags=........C, SSID=administraci\357\277\275

that SSID should read (probably) as "administración", and as you can see, tshark is replacing that "ó" with the UTF-8 replacement character...

reading the tshark docs, I've found this:

TShark uses UTF-8 to represent strings internally. In some cases the output might not be valid. For example, a dissector might generate invalid UTF-8 character sequences. Programs reading TShark output should expect UTF-8 and be prepared for invalid output.

Is there anything I can do to solve this? any other flag? or I should just deal with the fact that tshark can't handle accent marks at all?

EDIT: Also, can somebody enlight me on what the "SN" and "FN" columns really mean?

EDIT2: I'm on Linux Mint 19.2, bash 4.4.20. I also run the command from Python's subprocess.Popen with shell=FALSE flag, giving the same results.

edit retag flag offensive close merge delete

Comments

What OS and what shell are you running this on?

grahamb gravatar imagegrahamb ( 2019-11-15 18:36:02 +0000 )edit

I'm on Linux Mint 19.2, bash 4.4.20. I also run the command from Python's subprocess.Popen with shell=FALSE flag, giving the same results.

flariut gravatar imageflariut ( 2019-11-15 18:41:40 +0000 )edit

OK, what's your locale, i.e. the output of locale?

grahamb gravatar imagegrahamb ( 2019-11-15 19:11:07 +0000 )edit

output:

LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=es_AR.UTF-8
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=es_AR.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=es_AR.UTF-8
LC_NAME=es_AR.UTF-8
LC_ADDRESS=es_AR.UTF-8
LC_TELEPHONE=es_AR.UTF-8
LC_MEASUREMENT=es_AR.UTF-8
LC_IDENTIFICATION=es_AR.UTF-8
LC_ALL=
flariut gravatar imageflariut ( 2019-11-15 19:41:21 +0000 )edit

See the discussion here on this subject.

Jaap gravatar imageJaap ( 2019-11-16 14:13:45 +0000 )edit

1 Answer

Sort by » oldest newest most voted
0

answered 2019-11-16 10:58:16 +0000

SYN-bit gravatar image

The byte sequence \357\277\275 (0xEFBFBD) corresponds to the unicode character 0xFFFD which is the REPLACEMENT CHARACTER (used to replace an unknown, unrecognized or unrepresentable character). Looking at the source-code, Wireshark uses this character when there was a problem reading a UTF-8 character (either because a multi-byte sequence was cut of in the middle or when a non-UTF-8 byte sequence was encountered).

Are you able to provide a capture file with this particular probe request? I would like to check whether there was a problem interpreting the SSID name or that there was indeed an invalid character in the SSID name.

edit flag offensive delete link more

Comments

I created an access point with the SSID "administración" and indeed I also get the same behavior (Wireshark/Tshark 3.0.5 on MacOS). Here is the probe request packet:

0000   00 00 19 00 6f 08 00 00 dc f4 d1 97 00 00 00 00   ....o...........
0010   12 0c 3c 14 40 01 cd a1 01 40 00 00 00 ff ff ff   ..<.@....@......
0020   ff ff ff 6c 8d c1 2e 82 fa ff ff ff ff ff ff b0   ...l............
0030   fd 00 0f 61 64 6d 69 6e 69 73 74 72 61 63 69 c3   ...administraci.
0040   b3 6e 01 08 0c 12 18 24 30 48 60 6c 2d 1a 6f 00   .n.....$0H`l-.o.
0050   17 ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0060   00 00 00 00 00 00 00 00 7f ...
(more)
SYN-bit gravatar imageSYN-bit ( 2019-11-16 12:56:43 +0000 )edit

IEEE Std 802.11-2016 says, in section 9.4.2.2 "SSID element":

When the UTF-8 SSID subfield of the Extended Capabilities element is equal to 1 in the frame that includes the SSID element, or the Extended Capabilities of the source of the SSID information is known to include the UTF-8 SSID capability based on a previously received Extended Capabilities element, the SSID is interpreted using UTF-8 encoding. Otherwise, the character encoding of the octets in this SSID element is unspecified.

So if the UTF-8 SSID subfield of the Extended Capabilities element is not equal to 1 in the frame that includes the SSID element, and we didn't see a previously received Extended Capabilities element indicating that the sender has that capability, we'd need to use a heuristic to determine whether the SSID is UTF-8 or not.

We already have a way of specifying a "maybe ASCII ...(more)

Guy Harris gravatar imageGuy Harris ( 2019-11-17 08:33:46 +0000 )edit

Done: Bug 16208

SYN-bit gravatar imageSYN-bit ( 2019-11-17 09:22:08 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

Stats

Asked: 2019-11-15 17:37:16 +0000

Seen: 1,596 times

Last updated: Nov 21 '19