# How to properly use heuristic dissector for TCP

I am trying to implement a heuristic dissector for TCP packets.

On the README.heuristic, it says that

"Once a packet for a particular "connection" has been identified as belonging to a particular protocol, Wireshark should then be set up to always directly call the dissector for that protocol. This removes the overhead of having to identify each packet of the connection heuristically."

I am not very sure what this "should be set up to always directly call the dissector for that protocol" mean. Does it mean that whenever the first packet is identified, the subsequent packets in the TCP stream will be automatically identified correctly?

I have tried to implement a heuristic dissector for TCP that only identifies the first incoming packet of the stream I want. It seems that it only identifies the first packet but not the subsequent ones in the stream.

If the heuristic for TCP is applied to all packets individually. I am not sure if the sample code for heuristic dissector makes sense. The sample code seems to try to identify the first few bytes of the packets individually. However, this may not work in real life because TCP packets may be fragmented arbitrarily by hardware if the information sent is too long. If the heuristic is done this way, will some of the fragmented TCP packets be miss identified?

edit retag close merge delete

Sort by » oldest newest most voted

I am not very sure what this "should be set up to always directly call the dissector for that protocol" mean. Does it mean that whenever the first packet is identified, the subsequent packets in the TCP stream will be automatically identified correctly?

No.

"Should" has multiple meanings; there's "Used to indicate obligation, duty, or correctness, typically when criticizing someone's actions.", as in "I think we should trust our people more", and there's "Used to indicate what is probable.", as in "the bus should arrive in a few minutes". You're reading it in the latter sense; it was intended in the former sense.

That sentence should probably be changed to "Wireshark must be then set up..." to avoid the ambiguity.

The way you set it up is to arrange that there's a "conversation" for the TCP connection, and assign a non-heuristic version of your dissector as the dissector for that conversation; see the sample dissect_PROTOABBREV_heur_tcp() routine in the README.heuristic file - it does exactly that.

more

Thanks for the clarification!

( 2019-07-17 05:15:06 +0000 )edit

That sentence should probably be changed to "Wireshark must be then set up..." to avoid the ambiguity.

Hmm, there's a case where "should" and "must" aren't exactly the same - "That sentence must be changed..." is stronger than "That sentence should be changed", and maybe a case could be made for not changing it.

But in the document, it really, well, should be "must"-level strong, so I've changed it (and another instance in the same document).

( 2020-02-23 04:39:53 +0000 )edit

Maybe you are misunderstanding, so I will answer this question.

However, this may not work in real life because TCP packets may be fragmented arbitrarily by hardware if the information sent is too long.

The domain in this problem probably not a "conversation" but a TCP reassembly. Fragmented packets need to be reassembled first and then analyzed.

See the answers to the questions below for details.

How to parse the tcp data with fragments in lua

Wireshark should then be set up to always directly call the dissector for that protocol.

This means Whireshark should be set up to disable "Try heuristic sub-dissectors first" preference.

This paragraph is the summary of the first to fourth paragraphs of this section. [EDITED]

For example, if the tcp.port changes or conflicts with another protocol, the dissector monitoring for a particular tcp.port will not work properly.

In this case, you need to inspect all the tcp packets to determine the protocol in another way. (Specific character string / byte sequence of header, MAC Address, etc.)

This is a heuristic dissector, but since tcp has many heuristic dissectors and each of them inspects all tcp packets, it takes too much analysis cost (time) when inspecting large capture files there is a possibility. That is why the options are disabled by default.

more

Here is an example where the Heuristic dissector is not called: Heuristic dissector not called

( 2020-02-23 03:00:53 +0000 )edit