PCAP load speed difference between Wireshark and tshark
I have some PCAP files from which I'm trying to extract metadata. I am doing this using tshark, opening the file, extracting a couple dozen fields, then writing the table to disk. I've noticed that this process can be pretty time consuming, sometimes up to 60 minutes for a single PCAP file. I am performing reverse DNS on the data, using the default settings (-N dmN
) and I have the same reverse DNS settings in Wireshark. I understand that reverse DNS is a fairly time-consuming process relative to other processes that tshark/Wireshark is performing. However, when opening the same file in Wireshark and in tshark, Wireshark loads the file in a matter of seconds, while tshark will take minutes. My tshark command is:
tshark -r my_pcap_file.pcap \
-2 \
-T fields \
-E separator=/t \
-E header=y \
-E quote=d \
-e frame.time_epoch \
-e frame.len \
-e frame.protocols \
-e _ws.malformed \
-e _ws.col.Protocol \
-e _ws.col.Length \
-e ip.rec_rt \
-e ip.src \
-e ip.dst \
-e ip.src_host \
-e ip.dst_host > my_pcap_file.tsv
I've timed the processing of a few files using /usr/bin/time
followed by the tshark command shown above. To measure the fastest possible time, neglecting writing the output to disk, I directed the output to /dev/null
instead of my_pcap_file.tsv
. The resulting file sizes and timing outputs are:
20.7 MB: 6.16 user 3.51 system 11:36.95 elapsed 1% CPU
10.2 MB: 2.18 user 3.28 system 10:45.22 elapsed 0% CPU
42.1 MB: 6.70 user 5.13 system 44:07.60 elapsed 0% CPU
Is there a known reason for this speed difference? More importantly, is there a way I can speed up the tshark processing?
Just for fun, have you timed it without the 2nd pass (
-2
)?Are they both using the same profile and enabled protocols list?
Wireshark's just loading the file, which might just involve doing enough dissection to 1) set the columns in the packet summary list and 2) get information needed to dissect later packets. TShark is doing a full dissection - it has to, in order to find the particular fields you're reporting - and it's doing two passes. Try comparing Wireshark with
tshark -r my_pcap_file.pcap >/dev/null
.The most significant slow-down in tshark is due to reverse DNS lookup. For example, I timed tshark processing the same file, with the only difference between the commands being
-N dmnN
vs.-n
. The first:6.21user 3.86system 37:31.88elapsed 0%CPU
; the second:3.26user 0.29system 0:04.59elapsed 77%CPU
. But the same file, opened in Wireshark, with DNS lookup, in a matter of seconds. I opened the file in Wireshark using the command line:wireshark -r my_pcap.pcap -N dmnN
. The source and destination hosts are populated in Wireshark, and I can export the results as a CSV, but I have to use the GUI. In this case, I ensured that both Wireshark and tshark exported the same fields and had the same enabled protocols.@Guy Harris, when I simply read in the pcap and output to
/dev/null
, it is super fast. Unfortunately, this does not capture reverse DNS, which I need. As I dig into this more, the question seems to be: How/why does Wireshark perform reverse DNS faster than tshark?You originally wrote that you used
-N dmN
(the default settings, which does not use an external network resolver), which made less sense. If you are using-N dmnN
that makes more sense.Wireshark performs external DNS lookups asynchronously. When an IP address that needs to be looked up is encountered, the request is sent off but doesn't block the rest of the dissection. When the result comes back, it is used to fill in the information in the columns and in the dissection tree. You may have noticed the source and destination columns changing while the file is open when you use Wireshark with external DNS resolution. When Wireshark has finished loading the file, it hasn't actually finished doing all the external DNS lookups.
tshark performs external DNS lookups synchronously, at least on the second pass, or ...(more)