After a little more time ---> The 8000 Hz recording is just how the P25 is voice decoded into the wav file and just how it works, but the real battle is the modeling, audio cleanup/filtering, and speech recognition.
They speak fast on the radio, the audio isn't normalized, and not very clear in a lot of calls. The keywords that are weighted -- that a default model looks for -- isn't exactly normal speech patterns, so I get some really awesome and funny results on DeepSpeech. It's trying. It just doesn't know about sentence fragments and radio "banter".
It is going to be a keyword, machine learning, and model training exercise, so a machine with a lot of horse power, with a decent graphics card you can leverage the GPU's on is going to be required to make a model. Nvidia looks to be the most supported for this. Maybe it's possible to get close(r).