PCAP load speed difference between Wireshark and tshark

asked 2024-01-11 20:44:31 +0000

updated 2024-01-12 08:21:07 +0000

Guy Harris gravatar image

I have some PCAP files from which I'm trying to extract metadata. I am doing this using tshark, opening the file, extracting a couple dozen fields, then writing the table to disk. I've noticed that this process can be pretty time consuming, sometimes up to 60 minutes for a single PCAP file. I am performing reverse DNS on the data, using the default settings (-N dmN) and I have the same reverse DNS settings in Wireshark. I understand that reverse DNS is a fairly time-consuming process relative to other processes that tshark/Wireshark is performing. However, when opening the same file in Wireshark and in tshark, Wireshark loads the file in a matter of seconds, while tshark will take minutes. My tshark command is:

tshark -r my_pcap_file.pcap \
    -2 \
    -T fields \
    -E separator=/t \
    -E header=y \
    -E quote=d \
    -e frame.time_epoch \
    -e frame.len \
    -e frame.protocols \
    -e _ws.malformed \
    -e _ws.col.Protocol \
    -e _ws.col.Length \
    -e ip.rec_rt \
    -e ip.src \
    -e ip.dst \
    -e ip.src_host \
    -e ip.dst_host > my_pcap_file.tsv

I've timed the processing of a few files using /usr/bin/time followed by the tshark command shown above. To measure the fastest possible time, neglecting writing the output to disk, I directed the output to /dev/null instead of my_pcap_file.tsv. The resulting file sizes and timing outputs are:

20.7 MB: 6.16 user 3.51 system 11:36.95 elapsed 1% CPU

10.2 MB: 2.18 user 3.28 system 10:45.22 elapsed 0% CPU

42.1 MB: 6.70 user 5.13 system 44:07.60 elapsed 0% CPU

Is there a known reason for this speed difference? More importantly, is there a way I can speed up the tshark processing?

edit retag flag offensive close merge delete

Comments

Just for fun, have you timed it without the 2nd pass (-2)?

Are they both using the same profile and enabled protocols list?

Chuckc gravatar imageChuckc ( 2024-01-11 23:23:11 +0000 )edit

Wireshark loads the file in a matter of seconds, while tshark will take minutes.

Wireshark's just loading the file, which might just involve doing enough dissection to 1) set the columns in the packet summary list and 2) get information needed to dissect later packets. TShark is doing a full dissection - it has to, in order to find the particular fields you're reporting - and it's doing two passes. Try comparing Wireshark with tshark -r my_pcap_file.pcap >/dev/null.

Guy Harris gravatar imageGuy Harris ( 2024-01-12 08:24:56 +0000 )edit

The most significant slow-down in tshark is due to reverse DNS lookup. For example, I timed tshark processing the same file, with the only difference between the commands being -N dmnN vs. -n. The first: 6.21user 3.86system 37:31.88elapsed 0%CPU; the second: 3.26user 0.29system 0:04.59elapsed 77%CPU. But the same file, opened in Wireshark, with DNS lookup, in a matter of seconds. I opened the file in Wireshark using the command line: wireshark -r my_pcap.pcap -N dmnN. The source and destination hosts are populated in Wireshark, and I can export the results as a CSV, but I have to use the GUI. In this case, I ensured that both Wireshark and tshark exported the same fields and had the same enabled protocols.

ItsaMeJJ gravatar imageItsaMeJJ ( 2024-01-17 21:29:38 +0000 )edit

@Guy Harris, when I simply read in the pcap and output to /dev/null, it is super fast. Unfortunately, this does not capture reverse DNS, which I need. As I dig into this more, the question seems to be: How/why does Wireshark perform reverse DNS faster than tshark?

ItsaMeJJ gravatar imageItsaMeJJ ( 2024-01-17 21:32:44 +0000 )edit

You originally wrote that you used -N dmN (the default settings, which does not use an external network resolver), which made less sense. If you are using -N dmnN that makes more sense.

How/why does Wireshark perform reverse DNS faster than tshark?

Wireshark performs external DNS lookups asynchronously. When an IP address that needs to be looked up is encountered, the request is sent off but doesn't block the rest of the dissection. When the result comes back, it is used to fill in the information in the columns and in the dissection tree. You may have noticed the source and destination columns changing while the file is open when you use Wireshark with external DNS resolution. When Wireshark has finished loading the file, it hasn't actually finished doing all the external DNS lookups.

tshark performs external DNS lookups synchronously, at least on the second pass, or ...(more)

johnthacker gravatar imagejohnthacker ( 2024-01-17 23:14:26 +0000 )edit