This is a static archive of our old Q&A Site. Please post any new questions and answers at ask.wireshark.org.

Extracting Segmented SOAP XML Payload

0

The following script:http://ask.wireshark.org/questions/4639/extracting-soap-xml-payload/4835 allows extracting soap messages. However, it fails to do so when the message is distributed over several TCP segments. Any hint to solve this problem?

asked 29 Jan '13, 09:50

masgad's gravatar image

masgad
5114
accept rate: 0%

edited 29 Jan '13, 10:30

Please upload a sample capture on http://cloudshark.org, and post a link here. What version of tshark are you using?

(03 Feb '13, 10:30) helloworld

Thank you helloworld for your reply. tshark version: 1.9.0 and there you go a sample pcap: http://cloudshark.org/captures/74a6deb7aa4e

(04 Feb '13, 06:29) masgad

2 Answers:

1

The original Lua script pulls out TCP data in raw form. In your case, your payload is encoded, so the raw form looks like gibberish.

To work with your packet capture, change the first 3 lines of the Lua script to the following:

-- tap uses dfilter for HTTP and XML, where SOAP content is found
local tap       = Listener.new(nil, "http && xml")
local xml_field = Field.new("xml")

I must admit this Lua script is a bit "hacky".

I recommend checking out tshark -z follow,tcp,ascii,STREAM combined with a little scripting (bash, awk, python, etc). For instance, the following bash script dumps the SOAP streams from your pcap to individual files (one per stream). Further scripting is necessary to clean out the non-SOAP text from the files.

#!/bin/bash

TSHARK=$HOME/src/wireshark/tshark PCAP=$HOME/tmp/sample.pcap

write the streams to individual files

while read stream do echo "writing stream $stream –> $stream.txt" $TSHARK -qz follow,tcp,ascii,$stream -r $PCAP > $stream.txt done < <($TSHARK -T fields -e tcp.stream -r $PCAP | sort | uniq)

answered 04 Feb ‘13, 22:13

helloworld's gravatar image

helloworld
3.1k42041
accept rate: 28%

I tried the two suggestions above. The first one (i.e., using “http && xml” filter in the Listener) still missing a lot of xml SOAP messages. Dsipite capturing more SOAP messages, the script still missing SOAP messages besides that its output requires more cleaning to strip off non SOAP text. I am working on a workaround and I will post it later. Thank you.

(05 Feb ‘13, 13:21) masgad

Which packets in your sample.pcap contain the missing messages from the output?

(05 Feb ‘13, 22:46) helloworld

The Lua script cannot extract SOAP parts from segmented PDUs except the last one. I think that it couldn’t identify them as xml or http. I noticed that wireshark’s GUI also sees only the last packet as HTTP/XML whereas the previous packets are identified as TCP.

(11 Feb ‘13, 10:34) masgad

0

I made a dirty workaround on the code from this post: http://ask.wireshark.org/questions/4639/extracting-soap-xml-payload/4835 The modified code differs from the original one in the following:

  1. It sets the Listener's filter to tcp rather than http or xml
  2. It creates two separate Field extractors:

local xml_field = Field.new("xml")

local tcp_segment = Field.new("tcp.data")

  1. It checks whether the analyzed packet contains any segemented PDU, and if so it will append the its contents in soap_message.

  2. Every time it checks if the soap_message contains both beginning and closing tags. If yes, it flushes the soap_message before begin processing the next packet.

The problem with this work around, is that sometimes it produces incorrect reassembly where it appends PDU contents from other packets if it happens to arrive in between as far as the soap message was not complete :(

Here is the code:

-- tap uses filter for tcp and ignores retransmissions
local tap       = Listener.new(nil, "tcp && !tcp.analysis.retransmission")
-- XML field extractor
local xml_field = Field.new("xml")
-- TCP segment data extractor
local tcp_segment = Field.new("tcp.data")
local file      = nil
local soap_message = ''

– ####################################################################### – # If not already open, this opens a file for writing (append mode) – ####################################################################### local function open_file() if not file then local path = "." .. "/temp.xml" print("opening file:", path) file = assert(io.open(path, "a"), "Can't open file for writing") end end

local HTML_REQ = { ["HTTP"] = 1, ["GET "] = 1, ["PUT "] = 1, ["POST"] = 1, }

– ####################################################################### – # Extracts the XML field from the buffer and writes the field to file – ####################################################################### local function handle_xml(pinfo, tvb) if not file then print("no file…ignoring packet") return end – extract xml data if contained in single packet local data = '' local fieldinfo = xml_field() local segmentinfo = tcp_segment() – Check for PDUs if segmentinfo then segmentdata = tvb(segmentinfo.offset):string() data = segmentdata elseif fieldinfo then xmldata = tvb(fieldinfo.offset):string() data = xmldata end local starts = data:sub(1,4) – some of these packets start w/HTTP header…skip to XML if HTML_REQ[starts] ~= nil then – local pos = string.find(xmldata, "<%?xml version") local pos = string.find(data, "<soap:Envelope") if not pos then return end data = data:sub(pos) end

soap_message = soap_message .. data

print(&quot;\n\n-- #&quot;..pinfo.number..&quot; ---------------------------------------------------\n&quot;)
print(data)

local soap_begin = string.find(soap_message, &quot;&lt;soap:Envelope&quot;)
local soap_end = string.find(soap_message, &quot;&lt;/soap:Envelope&gt;&quot;)

– Check for a the completion of the soap meassage if soap_begin and soap_end then file:write("\n\n– #"..pinfo.number.." —————————————————\n") file:write(soap_message) soap_message = '' end end

– ####################################################################### – # tap.packet() is called to notify the Listener of a packet that – # matches its filter rule ("xml" in this case). This can be called – # multiple times before tap.draw(). – ####################################################################### – ####################################################################### – # tap.packet() is called to notify the Listener of a packet that – # matches its filter rule ("xml" in this case). This can be called – # multiple times before tap.draw(). – ####################################################################### function tap.packet(pinfo, tvb) print("\ntap.packet", "#"..pinfo.number)

-- XXX: Compensate for no tap.reset() in tshark
if not gui_enabled() then open_file() end

-- wrap the handler in a pcall() in case an error occurs
local ok, msg = pcall(  function()
                            handle_xml(pinfo,tvb)
                        end )

-- print any error and bow out
if not ok then
    print(&quot;wtf!&quot;, msg)
end

end

– ####################################################################### – # tap.draw() is called to notify the Listener to "draw" its results – # that were accumulated in tap.packet(). This is normally called after – # tap.packet(), based on "Preferences > Statistics > Tap update interval". – ####################################################################### function tap.draw() print("tap.draw") – flush toilet (NOTE: When $file is garbage collected, it's – automatically flushed and closed…that doesn't mean we – can't do it sooner to free resources.) if file then print("closing file") file:close() file = nil end end

– ####################################################################### – # tap.reset() is called to notify the Listener to reset any variables – # or counters in preparation for a packet (passed to tap.packet()). – # This can be called multiple times before a packet is even seen. – # – # XXX: tshark doesn't call this function, but Wireshark does. Bug? – ####################################################################### function tap.reset() print("tap.reset") open_file() end

answered 11 Feb ‘13, 11:33

masgad's gravatar image

masgad
5114
accept rate: 0%

edited 11 Feb ‘13, 12:17