Dumpcap/TCPDump packet loss when writing data to disk with high traffic
I'm using PyShark to capture packets on a relatively high traffic network. I noticed that packets were dropped in PyShark due to the high traffic.
Checking the PyShark code, it runs Dumpcap and pipes its output to TShark.
Running both manually with these commands results in about a 33% packet loss. (Testing with a tcpreplay of a pcap of 9786 packets in about 6 seconds).
mkfifo /tmp/pipe
dumpcap -q -i lo -a duration:10 -w - > /tmp/pipe
tshark -l -n -T pdml -w out.pcap -r - < /tmp/pipe
However, running Dumpcap by itself works perfectly fine with 0% packet loss.
dumpcap -i lo -a duration:10 -w -
I found two solutions that are not exactly ideal though
Solution 1
Edit the PyShark code to run TShark by itself to both capture and output its data for PyShark to process
Solution 2
Edit PyShark code to use Gulp to write more efficiently to the disk. However it seems that Gulp reaches its maximum buffer size in about 10 seconds on a high traffic test of about 5MB/s). I cannot find much documentation on how to increase its buffer size.
mkfifo /tmp/pipe
dumpcap -q -i lo -a duration:10 -w - | gulp -c > /tmp/pipe
tshark -l -n -T pdml -w out.pcap -r - < /tmp/pipe
Error:
gulp: ring buffer full
If you don't need to dissect the incoming data, then running dumpcap on it's own is by far the best for performance. There may also be platform specific capture options available to use that will increase performance further. You also need to make sure that any files used ,e.g. the temp file used by dumpcap and any output files are stored on the fastest drives available.
Using tshark to capture will cause it to start an instance of dumpcap and pipe the traffic from dumpcap into tshark.
If running dumpcap on its own is still dropping packets then you're going to have to investigate the world of expensive hardware solutions, e.g. specialised hardware capture NICs and\or appliances.
I see. So technically the developer of PyShark did not need to use dumpcap to pipe into tshark, but tshark on its own would be sufficient?
Presumably that was done for specific control of the processes.