Revision history [back]

Adding Suppressed Silence In RTP Audio Extracted via tshark

I have pcapsipdump running which produces a separate PCAP file for each call. I typically use Wireshark GUI to look at the flow sequence diagram and export the call audio using the procedure described here: https://wiki.wireshark.org/RTP_statistics. This works very well.

I'm looking to automate said process using tshark as I analyze a lot of files and I have mostly managed to accomplish this with the exception of this one issue which I will attempt to describe below.

For this example, I'm working with a PCAP for a VOIP call that has 2 streams using the G711 PCMU payload. One stream performs silence suppression and no RTP packets are sent for varying amounts of time unless there is audio as depicted in this screenshot.

Silence period and marker

This is a valid PCAP and following the specifications mentioned in https://tools.ietf.org/html/rfc3551#section-4.1:

For applications which send either no packets or occasional comfort-noise packets during silence, the first packet of a talkspurt, that is, the first packet after a silence period during which packets have not been transmitted contiguously, SHOULD be distinguished by setting the marker bit in the RTP data header to one. The marker bit in all other packets is zero.

Wireshark seems to handle this correctly when exporting the audio file via the GUI as shown below. Both streams are of the same length.

Correct audio exported using Wireshark GUI

However when I extract the RTP payload using tshark and then convert that to audio, the silence is missing and the resultant streams are not in the same length.

Silence missing in tshark RTP payload export

I'm currently using a modified version of this script for my automation: https://gist.github.com/avimar/d2e9d05e082ce273962d742eb9acac16 (this does not run as is on macOS Big Sur). I have also used a variation of this Python script: https://sdet.us/python-pcap-parsing-audio-from-sip-call/ and achieved the same unsynchronized result. I know I have to use the marker bit to identify where I need to 7.052418 seconds of silence. I'm just not sure how to accomplish this.

Can I get some pointers on what I need to do to add this silence back into the stream so the audio streams are synchronized? Bash based or Python based solutions will be really appreciated. Or if I can get some pseudo algorithm behind the process, I will try to implement it myself.

Thanks for your time!