Ask Your Question
0

Extracting individual HTTP Response Body with tshark

asked 2019-05-17 14:59:47 +0000

pmqs gravatar image

updated 2019-05-17 15:02:38 +0000

I'm writing a script to locate and extract specific HTTP response bodies from a pcap file.

The script works in two steps - the first part locates the HTTP transactions I'm interested in. I want to extract the HTTP response body from a a sub-set of those transactions. This part is fine. I've located the http.response_number of the HTTP objects I want to extract.

The part that I'm having difficulty with is using the http.response_number to extract the HTTP response body using tshark.

The closest I've found is the --export-objects, but I can't get it to filter on a specific http.response_number

tshark -r capture.pcap --export-objects http,objs http.response_number eq 1

The output to stdout suggests that the filter is selecting what I want

  994   1.809557 xx.xx.xx.xx \u2192 10.20.228.39 HTTP/XML 773 HTTP/1.1 200 OK

but I see every object from the pcap written to disk.

Anyone know if this is possible? I'm running tshark 2.6.1

edit retag flag offensive close merge delete

2 Answers

Sort by ยป oldest newest most voted
0

answered 2019-05-20 07:36:46 +0000

pmqs gravatar image

updated 2019-05-20 14:09:08 +0000

Answering my own question. After some trial and error, I found that the field http.file_data is what I'm looking for

tshark -r capture.pcap -T fields -e http.file_data http.response_number eq 1 and tcp.stream eq 4

The only documentation I can find forhttp.file_data is here. All it says is

http.file_data  File Data   Character string    2.2.0 to 3.0.1

Is there a better definition somewhere? I may not be looking in the correct place.

edit flag offensive delete link more

Comments

Does it work for you in tshark?

I tried with a couple of traces with version 2.6.8 and 3.0.1 and I think it might do something else than expected. In Wireshark selecting this field and exporting it's data does indeed result in a proper HTTP object, however, I do not think you can use -T fields to properly export the data of the http-payload.

I think it was made for the export-object menu item, as can be seen in the source code:

        /* Save values for the Export Object GUI feature if we have
         * an active listener to process it (which happens when
         * the export object window is open). */
        if(have_tap_listener(http_follow_tap)) {
                tap_queue_packet(http_follow_tap, pinfo, next_tvb);
        }
        file_data = tvb_get_string_enc(wmem_packet_scope(), next_tvb, 0, tvb_captured_length(next_tvb), ENC_ASCII);
        proto_tree_add_string_format_value(http_tree, hf_http_file_data,
                next_tvb, 0, tvb_captured_length(next_tvb), file_data, "%u bytes", tvb_captured_length(next_tvb));

Also, keep in mind that http.response_number is a counter ...(more)

SYN-bit gravatar imageSYN-bit ( 2019-05-20 10:30:31 +0000 )edit

Yep, I know about http.response_number being per TCP. My cut-and-paste from the real command line removed too much of the filter I was using.

http.response_number seems to work just fine for me (tshark 2.6.1). Here is what I get with a cut-down test

$ tshark -r some.pcap -T fields -e http.file_data   tcp.stream eq 4 and http.response_number eq 2 
#EXTM3U
#EXT-X-VERSION:7
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-TARGETDURATION:5

...

Do you think that http.response_number isn't the correct field to use?

A quick google search got http.response_number finds a few other instances where that field is being used to get the HTTP response body. See here and here

pmqs gravatar imagepmqs ( 2019-05-20 11:15:00 +0000 )edit

If you combine tcp.stream with http.response_number you will uniquely select a specific response. So no problem there.

It's just that the http.file_data field does not behave the way I expect it to in the traces I have tried. It does not give back the exact http-object. I was wondering if it does work correctly for you. And if so, if you can share the trace so I can try to found out why there is a difference in behavior :-)

SYN-bit gravatar imageSYN-bit ( 2019-05-20 13:15:01 +0000 )edit

Sure. Is there a common place for uploading pcaps on this forum?

pmqs gravatar imagepmqs ( 2019-05-20 14:28:34 +0000 )edit

Unfortunately not, you'd have to use a public file sharing service like github-gist, onedrive, dropbox, etc.

SYN-bit gravatar imageSYN-bit ( 2019-05-20 17:58:30 +0000 )edit

Try this file -- test.pcap

And run this

tshark -r test.pcap -T fields -e http.file_data  http.response_number eq 1
pmqs gravatar imagepmqs ( 2019-05-21 08:00:40 +0000 )edit

Thanks for the file. There is a difference between an exported HTTP object and the output of your command:

-rw-r--r--  1 sake  staff   3172 May 21 16:18 a70e4276-9bd4-4919-959d-e545aa33ddcf.m3u8
-rw-r--r--  1 sake  staff   3172 May 21 16:18 export-packet-bytes.bin
-rw-r--r--  1 sake  staff   3199 May 21 16:18 file_data

So, you maybe the method works for you for this object, there is no guarantee it works for any object.

SYN-bit gravatar imageSYN-bit ( 2019-05-21 14:23:40 +0000 )edit

Interesting - when I compare the export versus http.file_data I get exactly one byte difference. The output from http.file_data has an extra trailing line-feed (0x0a) that isn't present in the export. Wonder if that is coming from the fields output, given that it is expecting to output one or more fields followed by a newline?

I notice that the difference you are getting looks very like the number of lines in the output file (26) + 1. You running tshark on Windows by any chance and the line-feed line terminators in the file have been converted to CR+LF pairs?

Is this a bug or a feature for http.file_data ? Been trying to find a definition of what http.file_data is designed to do, but don't see anything.

pmqs gravatar imagepmqs ( 2019-05-21 15:22:01 +0000 )edit

The plot thickens. I put tshark on a Windows box to see if I could replicate what you are getting and I think I have.

On windows when I run

tshark.exe -r test.pcap -T fields -e http.file_data  http.response_number eq 1

I get a single line output where every line feed is output as \n. On Linux the line feeds are left as-is. Think that accounts for the extra bytes.

Means the behavior of http.file_data appears to be OS dependent.

pmqs gravatar imagepmqs ( 2019-05-21 15:48:51 +0000 )edit

I'm running Wireshark on MacOS. I took another look at the data and it seems the 0x0a in the original http object is being transformed in the characters '\n'. I need to check if the other files I tried were mangled in the same way.

SYN-bit gravatar imageSYN-bit ( 2019-05-21 21:05:05 +0000 )edit

Appears Windows & MacOS have the same behaviour then.

pmqs gravatar imagepmqs ( 2019-05-21 21:12:24 +0000 )edit
0

answered 2019-05-19 14:26:59 +0000

SYN-bit gravatar image

As per the tshark -h output:

  --export-objects <protocol>,<destdir> save exported objects for a protocol to
                           a directory named "destdir"

There is no indication that a filter can be applied. I just tried it with Tshark 3.0.1 and all http objects are saved. Whether I have a display filter for a specific tcp stream active or not doesn't matter, all http objects in the file are saved.

It would be useful if some filtering can be done, so you may add a feature request to Bugzilla for it.

edit flag offensive delete link more

Comments

Thanks. The field http.file_data appears to be what I'm looking for.

pmqs gravatar imagepmqs ( 2019-05-20 07:38:25 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2019-05-17 14:59:47 +0000

Seen: 15,744 times

Last updated: May 20 '19