Audio file to text?

Status
Not open for further replies.

snoack321

Member
Joined
May 4, 2022
Messages
25
Reaction score
6
Location
Pleasanton, CA
Is there a proscan add on or other software you can add to convert audio files to text to stream host on web server? Thanks in advance.
 

a417

Active Member
Joined
Mar 14, 2004
Messages
4,665
Reaction score
3,530
can you describe your use-case a little better? There are many ways to skin a cat, but I don't want you to be led down a dark and windy road that could end up in misery.
 

snoack321

Member
Joined
May 4, 2022
Messages
25
Reaction score
6
Location
Pleasanton, CA
LOL didn’t mean to be opaque. I finally figured out a file saving hierarchy that helps me find a audio file that contains something pertinent to my neighborhood. But it’s still an audio file and would greatly help me share information with neighborhood if I had a transcript (70% accurate) of the event. A bit wordy but that’s the use case. I’ve tried several apps and online services but thus far but none appear to be remotely useful (InqScribe, Otter.ai, Ibm Watson). TIA scott
 

KD0TAZ

Member
Joined
Dec 26, 2010
Messages
334
Reaction score
16
Location
Kansas
Radio transcription is pretty awful with off the shelf speech recognition products. I've tried a few and it really doesn't work at all . Personnel tend to talk fast and their jargon confuses it (regular speech-to-text models are designed to interpret standard conversational or dictated speech which doesn't include 10-codes, phonetic spellings, lots of last names, street names, etc). There is a guy who is using AI and training models to enhance it, but it's still not there yet.
 

xicarusx

Member
Feed Provider
Joined
Oct 2, 2008
Messages
116
Reaction score
84
Location
Sayre, PA
So I saw this post the other day and decided to give it a shot myself. Yea its not perfect, and its can be weird at times, but its kinda fun.

My transcription from the call (Actual):
Department 21, traffic control requested, Wysox Township. In front of the Sheetz on golden mile road. For motor vehicle accident. No injuries. Wysox three on scene.


The softwares transcription of the call:
Department twenty one, traffic control requested. Wisex Township. In front of the sheets and the golden mile road. Front motor vehicle exit. No injuries. Wisex three n c.
 

xicarusx

Member
Feed Provider
Joined
Oct 2, 2008
Messages
116
Reaction score
84
Location
Sayre, PA
What software did you use for this xicarusx?

I used Python and an API called Deepgram. Example script below... you will need a deepgram API key which is free and will allow you to transcribe 200 hours of audio before you need to pay. I have found the phone call model works best, and enhanced tier gives even better results.

Python:
import asyncio
from deepgram import Deepgram

deepgram_api_key = '' # insert your api key here
path_to_file = '/home/pi/audio/Litchfield_26_2022_07_29_13_23_26.mp3' # Absolute path to file
file_mime_type = 'audio/mp3' # Mimetype of the file Examples: 'audio/wav' 'audio/mp3'

async def main():
    # Initializes the Deepgram SDK
    dg_client = Deepgram(deepgram_api_key)
    
        with open(PATH_TO_FILE, 'rb') as audio:
        source = {'buffer': audio, 'mimetype': file_mime_type}
        # settings options to include puncuation phonecall model and enhanced tier
        options = {'punctuate': True, 'language': 'en', 'model': 'phonecall', 'tier': 'enhanced'}
        
        response = await dg_client.transcription.prerecorded(source, options)
        transcript = response["results"]["channels"][0]["alternatives"][0]["transcript"]
        if transcript:
            print(transcript)
        else:
            print("No Transcription")
            
asyncio.run(main())

Example of a recent call:
Dispatch Audio: https://bcfirewire.com/audio/Litchfield_26_2022_07_29_13_23_26.mp3
Transcription: "Apartment twenty six. Curt Valley m Mass and Rescue Lakeville Township near the intersection of Riverside Drive in Park Hollow Road for working vehicle fire."

Its not perfect, but if you can get the type of call almost 95% of the time. I have deciphered a way to get Road name and call type from these transcriptions.

1659121593360.png
 

DC31

Member
Feed Provider
Joined
Feb 19, 2011
Messages
1,650
Reaction score
184
Location
Massachusetts
Interesting @xicarusx. It really does quite well. It has trouble with road names and town names but otherwise is fairly accurate.
 

xicarusx

Member
Feed Provider
Joined
Oct 2, 2008
Messages
116
Reaction score
84
Location
Sayre, PA
How are you deciphering?

Looking for keywords and based on how the dispatches and street names work in my area.

example:

Python:
# Grab load transcript from json array to variable and make it lowercase.
transcript = response["results"]["channels"][0]["alternatives"][0]["transcript"].lower()

# create a variable to hold the action we find.
action = ""

if "tree down" in transcript or "trees down" in transcript:
    action = "Tree down"
elif "alarm activation" in transcript or "fire alarm" in transcript or "brush fire" in transcript or "structure fire" in transcript or "vehicle fire" in transcript or "unnkown type of fire" in transcript:
    if "fire alarm" in transcript or "alarm activation" in transcript:
        action = "Automatic Fire Alarm"
    elif "brush fire" in transcript:
        action = "Brush Fire"
    elif "vehicle fire" in transcript:
        action = "Vehicle Fire"
    elif "structure fire" in transcript:
        action = "Structure Fire"
    else:
        action = "Unknown type of fire"
else:
    action = "Unknown"
   
# find road, street, avenue, parkway, way, drive, lane, route, and get info from it
if "route" in transcript:
    pre = transcript.split("route")[0]
    post = transcript.split("route")[1]
    address = post.split(".")
    address = address[0].lstrip()
    address_split = address.split()
    # Since route is usually called by number "County Route 64" change any text numbers to integers
    address = text2int(address)
   
elif "street" in transcript:
    if "street," in transcript:
        transcript = transcript.replace("street,", "street.")
    pre = transcript.split("street")[0]
    post = transcript.split("street")[1]
    # Check if the second word from last is a direction "North Main Street"
    if "north" in pre.split()[-2]:
        address = "N." + " " + pre.split()[-1].lstrip().replace("  ", " ").capitalize()
    elif "south" in pre.split()[-2]:
        address = "S." + " " + pre.split()[-1].lstrip().replace("  ", " ").capitalize()
    elif "east" in pre.split()[-2]:
        address = "E." + " " + pre.split()[-1].lstrip().replace("  ", " ").capitalize()
    elif "west" in pre.split()[-2]:
        address = "W." + " " + pre.split()[-1].lstrip().replace("  ", " ").capitalize()
    # Check if the third word from last is a direction "North Hammond Dam Street"
    elif "north" in pre.split()[-3]:
        address = "N." + " " + pre.split()[-2].lstrip().replace("  ", " ").capitalize()
    elif "south" in pre.split()[-3]:
        address = "S." + " " + pre.split()[-2].lstrip().replace("  ", " ").capitalize()
    elif "east" in pre.split()[-3]:
        address = "E." + " " + pre.split()[-2].lstrip().replace("  ", " ").capitalize()
    elif "west" in pre.split()[-3]:
        address = "W." + " " + pre.split()[-2].lstrip().replace("  ", " ").capitalize()
    else:
        address = pre.split()[-1].lstrip().replace("  ", " ").capitalize()
else:
    address = "Unknown"

print(address + " | " + action)


The text to int code:

Python:
def is_number(x):
    if type(x) == str:
        x = x.replace(',', '')
    try:
        float(x)
    except:
        return False
    return True


def text2int(textnum, numwords={}):
    units = [
        'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight',
        'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen',
        'sixteen', 'seventeen', 'eighteen', 'nineteen',
    ]
    tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety']
    scales = ['hundred', 'thousand', 'million', 'billion', 'trillion']
    ordinal_words = {'first': 1, 'second': 2, 'third': 3, 'fifth': 5, 'eighth': 8, 'ninth': 9, 'twelfth': 12}
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    if not numwords:
        numwords['and'] = (1, 0)
        for idx, word in enumerate(units): numwords[word] = (1, idx)
        for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    textnum = textnum.replace('-', ' ')

    current = result = 0
    curstring = ''
    onnumber = False
    lastunit = False
    lastscale = False

    def is_numword(x):
        if is_number(x):
            return True
        if word in numwords:
            return True
        return False

    def from_numword(x):
        if is_number(x):
            scale = 0
            increment = int(x.replace(',', ''))
            return scale, increment
        return numwords[x]

    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0
            onnumber = True
            lastunit = False
            lastscale = False
        else:
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if (not is_numword(word)) or (word == 'and' and not lastscale):
                if onnumber:
                    # Flush the current number we are building
                    curstring += repr(result + current) + " "
                curstring += word + " "
                result = current = 0
                onnumber = False
                lastunit = False
                lastscale = False
            else:
                scale, increment = from_numword(word)
                onnumber = True

                if lastunit and (word not in scales):
                    # Assume this is part of a string of individual numbers to
                    # be flushed, such as a zipcode "one two three four five"
                    curstring += repr(result + current)
                    result = current = 0

                if scale > 1:
                    current = max(1, current)

                current = current * scale + increment
                if scale > 100:
                    result += current
                    current = 0

                lastscale = False
                lastunit = False
                if word in scales:
                    lastscale = True
                elif word in units:
                    lastunit = True

    if onnumber:
        curstring += repr(result + current)

    return curstring
 

cpg178

Member
Premium Subscriber
Joined
Sep 7, 2014
Messages
436
Reaction score
140
Does Deepgram learn over time if you continuously threw audio files at it, and then had a way to give it the correct transcription?
 

nosoup4u

Member
Feed Provider
Joined
Jan 30, 2002
Messages
2,257
Reaction score
434
Location
High Bridge, NJ
I got some funky results also, but it is still much better than my previous attempts trying to decode the audio.
 
Joined
May 9, 2023
Messages
1
Reaction score
0
Location
Bellingham, WA
I am in search of the proper methods, processes, technologies and group to invest resources and commit myself to until a highly accurate and overall satisfactory transcription of emergency personnel radio communications has been developed to where it can be easily implemented so that we can have access to written files.
Wonderful job on using AI and API . I was grateful to see that you were using the tech before it became so widely popular as it has been in the last few weeks.
Although I am aware of. By technology being older than a few weeks old. I am not well versed in the technology. And just today, I started to. My own API code. On. A different. Project that I'm working on.
Which brings me to my question and i apologize for being so long winded.

In the conversations above, someone had asked , How are you deciphering?"
In which you replied that you are using the street names or something.
What are you having to decipher?
Has there been any recent updates and or do you have any information or hits ir help that you could throw my way or suggest that I do or dont do or could you point me in the direction of great resources of information that i can study o make this system great!


thank you for your timne
thank you for your time and i
 

millam

Old Radio Guy
Premium Subscriber
Joined
Jan 18, 2005
Messages
795
Reaction score
192
Just wondering if anyone has used Word "Transcribe" function successfully to convert audio
to text. I used an atis site and it didn't work to good, 10%,. I used a NOAA wx channel and it worked
pretty good. Haven't got it to work using a vb cable out of DSDPlus yet. Just started trying. Setting
up vb cable A out of SDR# into DSDPlus and vb cable B out of DSDPlus and in to Word transcribe
function isn't working yet, maybe I can work it out. Any help would be appreciated.

Mil
 

Chance

Member
Premium Subscriber
Joined
Dec 19, 2002
Messages
109
Reaction score
23
Location
Sachse, Texas
In my experience so far, P25 Phase II quality is too poor for speech recognition. The best success I have had is with the automatic dispatching of fire calls. I use Microsoft Azure Cognitive Services with a custom trained model. The model consists of 300 wave files with text files transcripts. I also included a vocabulary of all call types and all street names in my city. I then run the output through about a hundred "find and replace" string operations to get an output that is about 95% correct.
 

millam

Old Radio Guy
Premium Subscriber
Joined
Jan 18, 2005
Messages
795
Reaction score
192
In my experience so far, P25 Phase II quality is too poor for speech recognition. The best success I have had is with the automatic dispatching of fire calls. I use Microsoft Azure Cognitive Services with a custom trained model. The model consists of 300 wave files with text files transcripts. I also included a vocabulary of all call types and all street names in my city. I then run the output through about a hundred "find and replace" string operations to get an output that is about 95% correct.
Wow, that's way above my pay grade. I'm just trying to visualize the audio, hard of hearing, need to much volume for others in the house,
if you know who I mean. Headsets are a nuisance. I said Word transcribe but I meant "Dictate" function.

Mil
 

Ubbe

Member
Joined
Sep 8, 2006
Messages
11,017
Reaction score
4,719
Location
Stockholm, Sweden
I tried the online Microsoft 365 and dictate in Words, and I have only one VBcable from SDR# and used its DMR plugin and it was more than 95% accurate. It even translated automatically if it where some foreign word used.

It stops dictate as soon as I use another window or if it's silence for 30 sec so not really totally useful for scanner work.
It can go back several words and edit mistakes it have done so it seems to be highly intelligent. If it only could stay permanently enabled. Then I could have it write everything that's being said on a channel and I can read that when I return to my PC. Much easier than listening thru all the voice files.

/Ubbe
 

millam

Old Radio Guy
Premium Subscriber
Joined
Jan 18, 2005
Messages
795
Reaction score
192
I tried the online Microsoft 365 and dictate in Words, and I have only one VBcable from SDR# and used its DMR plugin and it was more than 95% accurate. It even translated automatically if it where some foreign word used.

It stops dictate as soon as I use another window or if it's silence for 30 sec so not really totally useful for scanner work.
It can go back several words and edit mistakes it have done so it seems to be highly intelligent. If it only could stay permanently enabled. Then I could have it write everything that's being said on a channel and I can read that when I return to my PC. Much easier than listening thru all the voice files.

/Ubbe
That was my conclusion also. Thanks,
Mil
 

Ubbe

Member
Joined
Sep 8, 2006
Messages
11,017
Reaction score
4,719
Location
Stockholm, Sweden
I was listening to some mil traffic at 267MHz FM as they are training Ukraine pilots to fly JAS Gripen they will be getting soon, and as other scanners where interfering I couldn't really hear what they said but Word printed out 100% "Going supersonic mach 2 in 30 seconds"

/Ubbe
 
Status
Not open for further replies.
Top