This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Display filters using (tshark -Y) problem

0

Hello!

  1. This command works correctly:

    "C:\Program Files\Wireshark\tshark.exe" -Y "gsm_sms.sms_text contains "test"" -r "F:\Temp\pcapfile" -w "F:\Temp\resultfile.pcap"

  2. This command doesn't work (tshark: "'" was unexpected in this context.):

    "C:\Program Files\Wireshark\tshark.exe" -Y "gsm_sms.sms_text contains "тест"" -r "F:\Temp\pcapfile" -w "F:\Temp\resultfile.pcap"

There is just 1 difference: display filter contains field "test" (English text) or field "тест" (Russian text).

How can I fix it? Wireshark (not tshark) handles both display filters properly.

asked 06 May '14, 21:59

factorial's gravatar image

factorial
26448
accept rate: 0%

edited 07 May '14, 07:26

cmaynard's gravatar image

cmaynard ♦♦
9.4k1038142


One Answer:

1

I'm not sure if this will work or not, but you could try specifying the hex bytes instead of the text.

For example, instead of -Y "gsm_sms.sms_text contains "test"", you could write, -Y "gsm_sms.sms_text contains 74:65:73:74".

I'm not sure if this is correct, but instead of -Y "gsm_sms.sms_text contains "тест"" you could try -Y "gsm_sms.sms_text contains d1:82:d0:b5:d1:81:d1:82" ... at least those are the bytes that show up when I dump тест into a text file and run hexdump on it.

answered 07 May '14, 07:56

cmaynard's gravatar image

cmaynard ♦♦
9.4k1038142
accept rate: 20%

1

Looks like a Unicode issue, is it that the Russian text is a multibyte-character string and your cmd shell isn't handling that?

(07 May '14, 08:22) grahamb ♦

I didn't know about this opportunity. What's the conception of conversion text to hex and vise versa? I'll try your advice tomorrow and report result.

(07 May '14, 08:29) factorial

What's the conception of conversion text to hex and vise versa?

Sorry, but I'm not sure I understand your question.

(07 May '14, 08:33) cmaynard ♦♦

Sorry, but I'm not sure I understand your question. No, It's my guilt. English isn't native language for me. Sorry:) I'd like to know how did you obtain 74:65:73:74 from "test" and d1:82:d0:b5:d1:81:d1:82 from "тест"?

Looks like a Unicode issue, the Russian text is a multibyte-character string and your cmd shell isn't handling that. I also thinked about it. But I couldn't find solutions in Internet. Do you have any advices?

(07 May '14, 08:59) factorial

"test" is the four ASCII characters with values 0x74, 0x65, 0x73, 0x74. Using the method described in the answer @cmaynard found the Russian text to be the bytes he describes.

A further complication is how the Russian text is encoded in the packet you are looking at. Do you know the encoding used?

(07 May '14, 09:11) grahamb ♦

Do you know the encoding used? UCS-2. May be I can use format of command sort of tshark ... -Y "gsm_sms.sms_text contains ASCII="test"" or tshark ... -Y "gsm_sms.sms_text contains UCS-2="test""?

(07 May '14, 10:02) factorial

if the data in the packet is in UCS-2, then you could determine the UCS-2 codepoints for your Russian characters and then use those as the set of bytes to search for.

(07 May '14, 12:56) grahamb ♦

I've checked tshark -Y "gsm_sms.sms_text contains 74:65:73:74" and -Y "gsm_sms.sms_text contains d1:82:d0:b5:d1:81:d1:82". It works properly. Now I understand how ascii code 74:65:73:74 was obtained. But I can't understand how "тест" was converted to d1:82:d0:b5:d1:81:d1:82. UCS2 is a format that used for coding sms-messages. And I suppose that Wireshark decodes it and saves field "gsm_sms.sms_text" in other code sheme, because in UCS2 "тест" is 442:435:441:442 and it dosn't work. Christopher, explain me, please, how d1:82:d0:b5:d1:81:d1:82 was obtained from "тест"?

(07 May '14, 18:52) factorial

Answering my own question. d1:82:d0:b5:d1:81:d1:82 is obtained from "тест" with the aid UTF-8 for CP1251 coding scheme.

(07 May '14, 20:58) factorial

Ok, there is operating (suitable) solution - to use an HEX-form of text-field in gsm_sms.sms_text-filter. It isn't comfortably, but anyway I'll be able to write convertor to use in script. Thanks to Christopher and Graham!

(07 May '14, 21:08) factorial

Well, there might be an easier way, but I'm glad you've found at least 1 solution. Instead of searching for hex bytes though, maybe you could switch code pages first beforehand, perhaps via chcp.com or something like it? Could you post a small capture file to cloudshark, one that contains the тест text? I'm curious if it'll work in my console, with codepage 437.

(08 May '14, 07:02) cmaynard ♦♦
showing 5 of 11 show 6 more comments