This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Issue about crawl HTTP

0

I met a problem while using wireshark, which was that i only wanted to crawl the data package of HTTP by setting a filter, but when i input HTTP in filtering condition as setting the network card, why it showed red meaning the setting was wrong, but input TCP etc. , it showed right. please tell me what is the reason. Though filtering condition can be set to be tcp port 80, it can only crawl the HTTP passed the 80 port. If some HTTP do not pass the 80 part, how to crawl it?

asked 20 Dec '12, 22:52

jun's gravatar image

jun
1223
accept rate: 0%

edited 20 Dec '12, 23:07


One Answer:

0

it showed right. please tell me what is the reason.

the reason is already explained in your other question with the same content. I'll repeat it for you:

You cannot use http as a capture filter, as that is not valid libpcap filter syntax. whereas tcp is a valid filter.

See here: http://www.manpagez.com/man/7/pcap-filter/

Please use this filter instead: tcp port 80

Though filtering condition can be set to be tcp port 80, it can only gain the HTTP protocol passed the 80 port. If some HTTP protocol do not pass the 80 part, how to gain it?

Wireshark needs a criteria to identify a protocol during the capture phase. That criteria is usually the protocol and the port (80, 3128, 8080, etc.). So, if you want to capture HTTP with libpcap, regardless of the port, you can only try to identify the usual HTTP request commands in the tcp payload.

Looking for 'GET ' in the payload:

tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420

Looking for 'POST' in the payload:

tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x504F5354

Explanation:

  • ((tcp[12:1] & 0xf0) >> 2) represents the length of the tcp header
  • tcp[header_length:4] = 0x47455420 looks for 4 bytes in the tcp packet, beginning at the end of the header. Those 4 bytes are compared to the ASCII representation of 'GET '.

So, if you want to look for all HTTP commands, you need to combine several of these filters.

tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420 or tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x504F5354 or xxxxx

Replace xxxxx with the filters for HEAD and other HTTP commands.

IMPORTANT: There are some problems

the most important problem

  • with the filters shown above, you will only capture packets that contain the HTTP requests. As your are interested in the HTTP payload, this method does not work for you.

other problems

  • Some HTTP implementations accept 'get' or 'Get' instead of 'GET', so you will probably miss some HTTP requests, unless you add filters for all possible lowercase/uppercase combinations.
  • You will get false positives, if the strings 'GET', 'POST', 'HEAD', etc. happen to be at the beginning of the TCP payload (e.g. as part of a text file that is transferred via ftp).
  • You may get performance problems, if you need to capture at high packet rates

So, if you need the whole HTTP payload for all HTTP connections, regardless of the port, you cannot do that with libpcap filters (Wireshark capture filters). So, you can only capture all data and later use display filters to extract only HTTP sessions.

A possible alternative would be one of these commands:

tshark -ni 0 -R "http" -V
tshark -ni 0 -R "http" -T pdml

However, you cannot write that data into a pcap file (-w not supported together with -R), so you need to analyze the output of tshark with other tools than Wireshark.

HINT: tshark will also not detect HTTP on ports other than the default port list of the HTTP dissector: 80,3128,3132,5985,8080,8088,11371,1900,2869 !!

If you tell us more about your plans (why do you need to capture HTTP payload (regardless of the port), we might be find a different solution.

Regards
Kurt

answered 20 Dec '12, 23:23

Kurt%20Knochner's gravatar image

Kurt Knochner ♦
24.8k1039237
accept rate: 15%

edited 21 Dec '12, 10:45