Thanks for the information. I think I have a good grasp on the setup now. I was previously under the impression that providers uploaded high bit rate streams and Broadcastify filtered, compressed, and transcoded the stream at the server level. Reading a little deeper into the depths of RadioReference, I found pages about configuring various streaming clients to be in line with the “standards.”
For years I listened to Radio Reference and then Broadcastify and tolerated the compression artifacts, other insertions of distortion, and generally low audio quality since I did not have my own scanner. For the last couple of years before I bought a scanner, I listened only infrequently, mainly just when something came up. I would have paid to join for premium content immediately had it opened an opportunity for higher quality feeds. Now that I am ready to provide my own feed for the benefit of my community, I am much more concerned about the quality and professionalism of a product that will be my own. I’d want to make sure that I do it right -- and that I am not restrained from doing it right. That said, there’s no doubt I will host at least my primary feed on Broadcastify, assuming approval by the administration. Being part of the clearinghouse for government radio streams is a no-brainer for anyone wanting to make a difference.
There are always ways to provide better quality, but you've got to account for the lowest common denominator to attract the largest audience.
I want to cover this topic at length. Before I get into the weeds, though, I assert that improving the feed quality and adding features would attract an even larger audience than a lowest-common-denominator must fit all approach.
[size=+1]Sample Rate[/size]
When I asked about the sample rate, I was wondering why it was as
high as 22 kHz (16 kbps isn’t compatible with anything higher, anyways). Much of my post will cover sample rate, since it is an aspect of the current standards that can be considered without increasing data bandwidth.
At a channel bit rate of 16 kbps, there’s not much bandwidth available to quantify each sample at sample rate of 22 kHz (the sample rate determines how often the amplitude value is quantified). Quantization, related to the bit depth in uncompressed formats, is the mathematical (digital) representation of the true amplitude of each sample. Increased quantification resolution allows greater dynamic range and reduces noise (noise being the additive of repetitive inaccuracies in quantifying each sample -- the quantization error). In uncompressed audio, increasing the bit depth exponentially increases the quantization accuracy; however, increasing sample rate linearly affects quality. With marginal quality and limited bandwidth, one should get a bigger improvement favoring bit depth. Though mp3 compression makes bit depth more nuanced than with uncompressed PCM formats, I assume there is still a strong link between sample rate and quantization, even though the bit depth is not a transparent value in mp3. To summarize: in order to improve the signal to noise ratio -- i.e. the ability to make sense of voice over background noise -- it is ideal to have adequate quantization, which competes with the sample rate for bandwidth (the more samples in a given amount of time, the less bandwidth available for quantization at each sample).
The sample rate relates to the highest frequency one can reproduce with digital audio. Frequencies above one half of the sample rate are lost (one half of the sample rate is the Nyquist frequency).
Actually, to be more accurate, a sample rate must be double the highest frequency for all frequencies to be correctly sampled. Frequencies above the Nyquist frequency could create distortion during downsampling or compression; however, I’d consider this to be insignificant for streaming of voice radio traffic. Since the best of us can hear to around 20,000 Hz, common practice is that sample rates of 44.1 kHz (cd quality) or 48 kHz are plenty high enough to cover the range of what we can interpret -- and this is for replicating beautiful, intricate music -- not a dispatch robot. Unlike producing music, our goal isn’t to perfectly replicate the range of sound perceptible to man, and our goal, frankly isn’t perfectly replicating anything: the goal is to provide the highest level of speech intelligibility (with a secondary goal of eliminating erratic changes and compression artifacts). For many voice applications, a sample rate of 8 kHz is utilized. This includes the P25 ADC and vocoder. My WS1080 also stores recorded files in 8kHz (and they sound great). Common sample rates in between include 22.05 kHz, 11025 Hz, 32 kHz, and 16 kHz.
The most important frequencies to preserve for speech intelligibility are around 500 Hz to 3500 Hz. The fundamental frequency of speech is ~100-300 Hz, with most of the sound energy being below 1000 Hz. For a good overview of speech intelligibility, read up here:
Facts about speech intelligibility
Therefore, a sample rate of 8 kHz (which covers up to 4 kHz), should be more than adequate for preserving the important parts of speech. Indeed, the frequency spectrum plot, below, from Glenn’s Washington County raw audio supports this phenomenon. Note the level around 300 Hz and the rapid fall off at ~ 3500 Hz. The top plot is an average of several seconds of voice, and the bottom spectrum is an instant within the time period considered for the top plot. The audio sampled is from a 32 bit uncompressed file, so I am fairly confident that this is exactly what the encoder is receiving before his feed is compressed. Here’s a sample of the audio (13 mb):
Source Audio
As another example, below you will find a frequency analysis of audio straight from a DSD+ decode of P25 Phase 1 audio. It is saved by DSD+ at 8 kHz, so there is no data above 4 kHz. Note the range ~200 to 3,500 Hz. This is what is sounds like (I silenced someone’s name, which means you won’t be able to exactly replicate my plot):
DSD+ Sample
Now let’s look at Glenn’s feed source audio spectrum in a linear scale, which makes it easier to correlate to sample rate. I drew lines at several key frequencies: The Nyquist frequencies of several common audio sample rates (double the Nyquist frequency, below, for the respective sample rate). For each sample rate, frequencies to the right of the line would not be replicated. The higher frequencies on this plot are so quiet that they are almost completely irrelevant. Even the lowest sample rate of
8 kHz captures everything that we are interested in. One might make a case for going to
11025 Hz or
16000 Hz to reduce aliasing distortion in the event a lowpass filter at the Nyquist frequency was not applied to the stream before downsampling and compression.
An aside about “noise.” Unwanted noise is almost inevitable when streaming scanning traffic: Quality and reception problems on the radio user side, scanner reception, digital decoding, distorted audio levels, audio cables, power hum, processing distortion, and compression all have the potential to add noise to the final product (tips for reducing this in the
links Mr. Blanton shared above). Therefore, I’m not terribly concerned about noise added during down sampling. In fact, adding low level random noise to the audio likely improves the intelligibility of the speech. The hums and hisses we allow in are usually of organized fashion, which draws our attention. Ideally the noise floor would not only be low, but it would be of random quality. Unless there are large, unique spikes above the Nyquist frequency, aliasing distortion should be mostly random in nature. The process of dithering, which would probably improve many feeds, involves intentionally inserting low level white noise into the audio to mask distortion and artifact (perhaps another post about that in the future if I can refine a procedure for streams).
The current Broadcastify “standard” requests providers to use mono feeds of constant bit rate of 16kbps with a sample rate of 22050 Hz. Below you will find a frequency spectrum from the Broadcastify archive version of Glenn’s audio feed. Note, the lack of meaningful frequency response between 5.6 and 11 kHz.
As you might notice from the following plot, from a replication standpoint, there is little gained by a 22050 Hz sample rate. Sample rates of 8 kHz or 11025 Hz should be plenty adequate for the purpose of streaming scanner traffic and would allow more emphasis on quantization.
I encoded Glenn’s source audio in over a dozen different mp3 formats to compare quality. At the 16 kbps constant bit rate, I compared 8, 11, 16, and 22 kHz sample rates. I found the lower sample rates to be more dynamic, with more variability in amplitude compared to the duller 22 kHz version. The 22 kHz sample is covered in compression artifacts, including clicks, a more distinct rushing water hiss, metallic sounds, and other background noise I just can’t describe (in fact, I think I heard a P25 control channel in there…
. The artifacts introduced in the 22 kHz audio are incredibly fatiguing to listen to for long periods, and the voice quality is not as good as the 8 kHz audio. Overall, the 22 kHz audio was just loud, and the 8 kHz audio had more dynamics between soft, medium, and loud.
Audio File Samples (constant bit rate):
Original Source:
Short Source
Broadcastify Archive:
Streamed Copy
16 kbps, 22.05 kHz:
22 kHz
16 kbps, 16 kHz:
16 kHz
16 kbps, 11 kHz:
11 kHz
16 kbps, 8 kHz:
8 kHz
Comparison (22.05 kHz, then 8 kHz, then 22.05 kHz again):
22 vs 8 kHz
More objectively, I compared the various audio samples with Audacity’s “
contrast tool,” which measures average foreground and background sound levels to determine intelligibility. The analyzer, which was created through a federal grant... , says that it is ideal to have at least 20 dB difference between foreground and background audio.
While the trend was that the lower sample rate, the lower the average volume, the background noise was disproportionately quieter, which means it is easier to understand the spoken foreground. The 8 kHz audio had 0.3 dB greater foreground-background difference compared to the 22.05 kHz sample.
The 11 kHz audio had the greatest difference of 19.5 kHz.
In the following visual comparison of the waveforms, I amplified the 22 kHz file to clipping and then applied the same amplification ratio to the 8 kHz file. One can see that, even though the top waveform is louder overall, there are several areas where the greater dynamic range of the 8 kHz audio can be identified, as well as other areas of variability between the files. Unfortunately, the codec can’t identify what is “noise” to us, so the background hum, static, etc. is treated as very important by the encoder (it thinks it might be some really important lul in a symphony, so it tries hard to encode it). So even in the background “noise,” there is much greater dynamic range and replication in the 8 kHz file, even though that is not necessarily desirable. It would be great to find an encoder that recognized that as noise and trashed or deemphasized it.
Finally, here is a visual comparison of the two clips on a decibel waveform plot. On the far left and right sides, especially, I notice greater intricacy and dynamic replication on the bottom, 8 kHz, audio file. In the middle, lower frequency time period, there are several peaks that I believe demonstrate quantization error (the peaks on the top 22 kHz file are at the same level, while the 8 kHz file, which should have higher depth resolution, shows the same peaks as being distinctly different values).
While a lower sample rate is acceptable and perhaps ideal for replication accuracy, compatibility could be a contraindication. Testing on the common man’s electronic devices would be prudent to make sure files with lower sample rates are properly decoded (i.e. while everything I write, I can confirm works on my computer, the software, codecs, and browsers that I’m using might not be average).
[size=+1]Bit Rate and Codec[/size]
Wow, so that was a lot about sample rates. Moving on to other audio quality considerations… Since 16 kbps is such a low data rate, any improvements have potential for substantial marginal benefit. Encoding a constant bit rate is incredibly inefficient, especially for scanner traffic since there is so much silence between the audio of interest. Not only would moving to an average or variable bit rate encoding vastly improve quality, but it would likely decrease server load. While the variety of encoding software able to accomplish this becomes more limited and some listeners might be forced into the twenty teens, there’s room for substantial improvement.
Next, it just doesn’t make sense to use a codec designed to compress music for scanner traffic. There exist codecs specifically for speech, as well as other general use codecs, such as Opus, that are superior to mp3. Many of these inherently utilize variable bit rates. Though the compatibility of different codec is not universal, many are common. Most listeners would either not have a problem at all or would be able to obtain free solutions to stream the codec. 16 kbps is far from mp3’s sweet spot of 128 kbps. Additionally, it would be nice to have an encoder specifically for the characteristics of scanner traffic, such as low bit rate replication and focusing on the frequencies important to speech intelligibility while ignoring noise from the radio or other introduced sources.
I assert it’s worth considering an increase in the bit rate standard. Even if stuck with constant bit rate mp3 encoding, upping to 24 or 32 kbps for a mono feed would ease most of the artifact that can make feeds hard to understand and the painful from extended listening. Data is cheaper and more accessible these days, even on mobile devices. Combine a bit rate increase with the addition of a codec like Opus, then you’d be providing a very high quality feed.
Examples:
~ 16 kbps variable bit rate mp3, 8 kHz
32 kbps constant bit rate, 11 kHz
~ 28 kbps variable bit rate, 16 kHz
Note: The two approx variable bit rates were determined based on a much longer sample with more silence, characteristic of a scanner feed. In the examples above, I included only voice, so the average bit rate is higher than what I have listed.
Another area to look into is adaptive bit rate streaming, which is offering the same content in multiple formats and/or qualities. Correct me if I am wrong, but Broadcastify runs on IceCast, so it should already be capable of multiple streams per mountpoint. Adding a “premium” high quality feed through adaptive streaming would allow the lowest common denominator to still access their scratchy 16 kbps mp3; however, adding high audio quality streams would expand the service to a wider audience (and perhaps increase the listening time of the current audience) and allow another perk for the paid membership. This technique would require the provider to encode and upload multiple versions of their stream. But this could also be used to promote the best feeds on the site. Feeds reviewed and meeting the premium quality standard should be more visible, allowing first time visitors to have a good listening experience, increasing the odds they return. Getting up to 96 kbps would be out of this world different.
Other considerations for a higher quality feed include:
- Balancing audio (for stereo feeds) and reducing hum and static, which further decompensate when encoded.
- Be careful of compression. I understand this is advocated on this site, but I would stay away from compression. With radio traffic, it has the potential to greatly amplify keyup clicks, pops, background noise, and other distortions. A better approach is to ensure proper gain settings. Systems or frequencies that don’t meet the target volume should be individually modified through the radio through auto gain control and/or audio boost. As far as individual calls varying in volume, that’s important to relay in the feed. An officer whispering, someone shouting, or a mumbling dispatcher are all nuances worth maintaining for the listener.
- Apply low pass and high pass filters before the encoder. Especially with analog systems that carry data, the low end (to ~ 200 Hz) should be removed. Everything above the Nyquist frequency should be removed with a low pass filter. For any weird hum or distortion that remains in the middle, utilize a notch filter. Test the addition of white noise (dithering).
- Investigate and develop a way to keep digital traffic digital all the way to the streaming encoder.
- And in summary: look into a lower sample rate (I suggest either 8000 or 11025 Hz), consider upping the bandwidth standard, consider or develop new codecs, utilize variable bit rate, and promote adaptive streaming. Ideally, adaptive streaming would be implemented with an option for a very high quality feed; if that’s not feasible, relaxing the standards to allow mono feeds 32kbps constant mp3 in either 8 or 11 kHz or ~24kbps variable mp3 in 8 kHz.
Regarding delay, which seems to be a particularly hot button issue around here, it is inherent and impossible to overcome. For a digital system, delay starts at the user’s radio as it processes and encodes his voice. Delay is further introduced at the site control and repeater, as it processes, encodes, and time synchronizes the transmission. Finally, our scanners receive the traffic, which is then delayed by the feed provider’s encoder, upload, Broadcastify’s relay, the listeners download, and the listener’s buffer. While the feeds are “live,” they are all delayed. Just like we still have “live” cable news on satellite tv, even though there is a substantial delay in delivering the content to the viewer. To truly test this impact, we need radio users to make a call while listening to the stream and time the delay. All the excitement about delays cracks me up. Indeed, I would appreciate a delay (or a last 5 minutes option)... I lot of times when I hear the sirens coming, I miss the dispatch tones and additional information by the time I get the stream going.
“Real time audio feed for both channels is being supplied by outputs from a Modular Communication Systems Ultra-Comm E911 dispatch console” -- Cleburne, TX PD
Congratulations to the City of Cleburne!
Every government agency should have their console plugged straight to a Broadcastify feed. I would love to get in touch with whoever got this going and see if we can spread the trend. However, I’m not sure I’d consider the feed great quality. The left, police, channel seems hotter than the right, and the right channel has a bad, continuous hum, tone, and the continuous compression artifacts that go along with any audio output more than “complete” silence. Turning off the right channel makes it much more tolerable. Other than the degraded quality due to 16 kbps constant encoding, the left channel of the feed is superior to most others on the service.
I am a paid member because I very highly support the effort and product of this service. In addition to relaying life and property saving information, this service is a worldwide leader in transparency and a top primary source of unedited insight into our government and communities. Words can’t describe how great it is. If it wasn’t of interest to me that it succeeds and propagates, I would not have contributed such a thorough analysis. I understand that there is no such thing as a free lunch, but hundreds of feed providers are actually providing a free lunch to the service because they believe in the cause. Everyone should agree that considering quality enhancements is prudent and that it could potentially grow the user base substantially.
I should like, if allowed, to experiment various codecs and techniques on a test stream on the site. Running tests through my own server isn’t as reliable of an indicator for how different implementations would work on Broadcastify and the diverse hardware and software of the listener base.
I am very much looking forward to reading additional compression and streaming expertise of others, as well as what I may have gotten right and what I got wrong.
Please also pass along those feeds that you consider to be of exceptional quality (and if it is yours, what you did to make it great).
Cordially,
Justin