Ask Your Question

Parsing Wireshark Capture Files

asked 2019-09-25 19:48:11 +0000

joe.c gravatar image

When trying to parse .pcap files from wireshark, collecting TCP packets. I am attempting to parse these files using the information listed here:

The TCP packets being sent accross have a paramerater state the size of the packet. However, often this size is greater than the wireshark packet header is set on top of each packet. However these packet sizes are much less then the snap len global packet header (which if I understand is the greatest a packet captured could be).

Is there a property of TCP I am not seeing? I don't understand how wireshark could be capturing less than the packet size itself.

edit retag flag offensive close merge delete


The TCP packets being sent accross have a paramerater state the size of the packet. However, often this size is greater than the wireshark packet header is set on top of each packet.

What do you mean by "the size of the packet"? The pcap header has two size values, one of which is the size of the link-layer frame on the network, and the other is the number of bytes of the frame that were captured. The latter may be less than the former if the capture was done with a snapshot length less than the size of the frame on the network.

Guy Harris gravatar imageGuy Harris ( 2019-09-25 20:55:00 +0000 )edit

@Guy Harris Okay. I'm probably just misunderstanding it then. So for example, when logging the output of my parser:

Wireshark Incl Size: 1514 Bytes Wireshark Orig Size: 1514 Bytes Snaplen Size (found via .pcap global header): 65535

However, the packets I am parsing.. which I guess would be the TCP packets payload, has its own size in the payload header which is 16244 Bytes. Is it a safe assumption to just assume we should be chaining these TCP Packets together? How would I know my next packet is a TCP packet

joe.c gravatar imagejoe.c ( 2019-09-26 11:13:24 +0000 )edit

2 Answers

Sort by ยป oldest newest most voted

answered 2019-09-26 17:30:34 +0000

Guy Harris gravatar image

Incl Size: 1514

That's the number of bytes of the frame that were captured. That's a full-size Ethernet packet, but without the FCS (which is often not provided to the host by the adapter, by default).

Orig Size: 1514

That's the number of bytes of frame that were on the wire. The Incl Size is equal to the Orig Size, so nothing was chopped off ("sliced") by the capture process having a snapshot length shorter than the maximum frame size.

However, the packets I am parsing.. which I guess would be the TCP packets payload, has its own size in the payload header which is 16244 Bytes.

The packets you're parsing would be the payload of the TCP segments.; a packet at a layer above the link layer can be larger than a single link-layer packet, and parsing that requires assembling multiple link-layer packets together. See SYN-bit's answer for the rest of your questions.

edit flag offensive delete link more

answered 2019-09-26 07:42:16 +0000

SYN-bit gravatar image

Since TCP is a streaming protocol, the packet bounderies are just artificial cuts in the data stream. This is needed because every network has a finite maximum length per packet, this maximum length is called the Maximum Transmission Unit (MTU). For normal ethernet, the MTU is 1500. This means ethernet can send 1500 bytes of data to another ethernet host. From this 1500 bytes, 20 bytes are needed for the IP header and 20 bytes are needed for the TCP header, leaving 1460 bytes for the TCP payload. This is what is called the Maximum Segment Size (MSS). If TCP needs to sends a block of data (a higher layer Protocol Data Unit (PDU)) that is larger than this MSS, it will break up the PDU into smaller pieces of size MSS and send those segments as individual packets to the receiver. The receiver then strips the ethernet/IP/TCP headers and places the TCP segment in the receive buffer. All segments are processed that way until the full PDU is received and the data can then be handed over to the application in one piece.

If I have read your question correctly, you are trying to parse a pcap file in which there are packets of protocol X which are transported over TCP. Protocol Data Units (PDUs) of protocol X have a length header that indicate the length of the PDU. However that length can be greater than the Maximum Segment Size (MSS), therefor the PDU will not fit into one packet as explained. Your parser needs to read the length of the PDU and it will need to keep reading TCP packets until all bytes of the PDU are received so that it can dissect it. This is called reassembly in Wireshark and your parser should do something similar too.

edit flag offensive delete link more


You pretty much nailed it. Few parsing questions though.

From my understanding.. there is a global pcap header, and than another smaller pcap header "Record Header" on each of the packets. So would it be 1 TCP packet has a Record header.

There could be other network on the traffic so I'm guessing my formula would be:

Find Pcap Record Header -> Determine if payload of Record Header is a TCP Packet -> Check TCP Packet for Protocol X that I am searching for.

If the protocol is chained among multiple TCP packets, is there some sort of guarrentee no other packets got in between that stream? If my message was broken into 3 TCP packets, if I find the first packet.. will I know that the next 2 packets contain the data?

joe.c gravatar imagejoe.c ( 2019-09-26 11:18:46 +0000 )edit

Yes, if your protocol X PDU is segmented into 3 TCP segments, you need to collect those three packets and reassemble the payload data into the original PDU. Each individual TCP/UDP connection can be identified by a 5-tuple: SourceIP, DestinationIP, Protocol, SourcePort, DestinationPort (where protocol is TCP for your protocol X as it is transported over TCP).

All packets with the same 5-tuple will belong to the same TCP connection, now within that connection, the data is ordered by sequence numbers. With the sequence numbers, the receiving end can put all the segments in order when they were received out-of-order. And it can detect if there is data missing.

Your parser will need to do the same thing, collect the packets for the same connection, extract the data and put that data in the right order. Now things can get complex as several things can happen, data might be ...(more)

SYN-bit gravatar imageSYN-bit ( 2019-09-26 11:50:17 +0000 )edit

Where is this 5 Tuple? I was going off of which dosen't seem to have information on the protocol.. is there an additional layer?

joe.c gravatar imagejoe.c ( 2019-09-26 13:13:38 +0000 )edit

are there other layers? Or would it simply be

PCAP GLOBAL HEADER -> PCAP Header -> TCP HEADER -> TCP Payload -> [...additional TCP Packets.. or the next PCAP Header]

Hard to tell if I should be expecting other packets in between? If that makes sense. THanks so much!

joe.c gravatar imagejoe.c ( 2019-09-26 13:17:12 +0000 )edit

The 5-tuple consists of the following elements in different layers:

  • source IP address (IP layer)
  • destination IP address (IP layer)
  • protocol number (IP layer)
  • source port (TCP layer)
  • destination port (TCP layer)

Parsing pcap files is not an easy task and it seems you are in a little over your head.

What do you want to accomplish and why not use wireshark or tshark for it, as it has a lot of ground already covered and it can be extended quite easily with a protocol dissector for your protocol and/or a lua script to extract things from your protocol.

SYN-bit gravatar imageSYN-bit ( 2019-09-26 14:30:22 +0000 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower


Asked: 2019-09-25 19:48:11 +0000

Seen: 6,227 times

Last updated: Sep 26 '19