Adding Suppressed Silence In RTP Audio Extracted via tshark

asked 2021-01-11 07:36:12 +0000

3x0du5
1 ●1 ●1 ●1

I have pcapsipdumprunning which produces a separate PCAP file for each call. I typically use Wireshark GUI to look at the flow sequence diagram and export the call audio using the procedure described here: https://wiki.wireshark.org/RTP_statis.... This works very well.

I'm looking to automate said process using tshark as I analyze a lot of files and I have mostly managed to accomplish this with the exception of this one issue which I will attempt to describe below.

For this example, I'm working with a PCAP for a VOIP call that has 2 streams using the G711 PCMU payload. One stream performs silence suppression and no RTP packets are sent for varying amounts of time unless there is audio as depicted in this screenshot.

Silence period and marker

This is a valid PCAP and following the specifications mentioned in https://tools.ietf.org/html/rfc3551#s...:

For applications which send either no packets or occasional comfort-noise packets during silence, the first packet of a talkspurt, that is, the first packet after a silence period during which packets have not been transmitted contiguously, SHOULD be distinguished by setting the marker bit in the RTP data header to one. The marker bit in all other packets is zero.

Wireshark seems to handle this correctly when exporting the audio file via the GUI as shown below. Both streams are of the same length.

Correct audio exported using Wireshark GUI

However when I extract the RTP payload using tshark and then convert that to audio, the silence is missing and the resultant streams are not in the same length.

Silence missing in tshark RTP payload export

I'm currently using a modified version of this script for my automation: https://gist.github.com/avimar/d2e9d0... (this does not run as is on macOS Big Sur). I have also used a variation of this Python script: https://sdet.us/python-pcap-parsing-a... and achieved the same unsynchronized result. I know I have to use the marker bit to identify where I need to 7.052418 seconds of silence. I'm just not sure how to accomplish this.

Can I get some pointers on what I need to do to add this silence back into the stream so the audio streams are synchronized? Bash based or Python based solutions will be really appreciated. Or if I can get some pseudo algorithm behind the process, I will try to implement it myself.

Thanks for your time!

edit retag flag offensive close merge delete

add a comment

answered 2021-01-11 12:24:36 +0000

Jaap

13782 ●725 ●115

You are on your way to implement an RTP receiving endpoint, with all the niceties of jitter buffering, wander, codec conversion, etc. This can be a complex beast. Since you already have some bits and pieces, let's focus on the handling of silence suppression. For this to work you need to keep your own sample play out time and track that timebase against the incoming packets. If there's none, you'll have to insert the appropriate amount of silence yourself (based on the latest comfort noise parameters received). If there's one that needs to be interpreted, either updating the comfort noise info or as a packet full of audio samples.

Probably timekeeping at your end is what is crucial here.

edit flag offensive delete link

Comments

@Jaap, thank you for your answer. I'm still very much a newbie in this field and you've mentioned quite some concepts in your answers that I have no idea about. I will go do further research and get back here if I have further questions.

3x0du5 ( 2021-01-11 17:09:22 +0000 )edit

add a comment

Adding Suppressed Silence In RTP Audio Extracted via tshark

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

Adding Suppressed Silence In RTP Audio Extracted via tshark edit

1 Answer

Comments

Your Answer

Question Tools

Stats

Related questions

Adding Suppressed Silence In RTP Audio Extracted via tshark