Audio file to text?

snoack321 · Jun 1, 2022

Is there a proscan add on or other software you can add to convert audio files to text to stream host on web server? Thanks in advance.

a417 · Jun 2, 2022

can you describe your use-case a little better? There are many ways to skin a cat, but I don't want you to be led down a dark and windy road that could end up in misery.

snoack321 · Jun 2, 2022

LOL didn’t mean to be opaque. I finally figured out a file saving hierarchy that helps me find a audio file that contains something pertinent to my neighborhood. But it’s still an audio file and would greatly help me share information with neighborhood if I had a transcript (70% accurate) of the event. A bit wordy but that’s the use case. I’ve tried several apps and online services but thus far but none appear to be remotely useful (InqScribe, Otter.ai, Ibm Watson). TIA scott

KD0TAZ · Jul 15, 2022

Radio transcription is pretty awful with off the shelf speech recognition products. I've tried a few and it really doesn't work at all . Personnel tend to talk fast and their jargon confuses it (regular speech-to-text models are designed to interpret standard conversational or dictated speech which doesn't include 10-codes, phonetic spellings, lots of last names, street names, etc). There is a guy who is using AI and training models to enhance it, but it's still not there yet.

xicarusx · Jul 27, 2022

So I saw this post the other day and decided to give it a shot myself. Yea its not perfect, and its can be weird at times, but its kinda fun.

My transcription from the call (Actual):
Department 21, traffic control requested, Wysox Township. In front of the Sheetz on golden mile road. For motor vehicle accident. No injuries. Wysox three on scene.

The softwares transcription of the call:
Department twenty one, traffic control requested. Wisex Township. In front of the sheets and the golden mile road. Front motor vehicle exit. No injuries. Wisex three n c.

racingfan360 · Jul 29, 2022

xicarusx said:
So I saw this post the other day and decided to give it a shot myself. Yea its not perfect, and its can be weird at times, but its kinda fun.

What software did you use for this xicarusx?

xicarusx · Jul 29, 2022

racingfan360 said:
What software did you use for this xicarusx?

I used Python and an API called Deepgram. Example script below... you will need a deepgram API key which is free and will allow you to transcribe 200 hours of audio before you need to pay. I have found the phone call model works best, and enhanced tier gives even better results.

Python:

import asyncio
from deepgram import Deepgram

deepgram_api_key = '' # insert your api key here
path_to_file = '/home/pi/audio/Litchfield_26_2022_07_29_13_23_26.mp3' # Absolute path to file
file_mime_type = 'audio/mp3' # Mimetype of the file Examples: 'audio/wav' 'audio/mp3'

async def main():
    # Initializes the Deepgram SDK
    dg_client = Deepgram(deepgram_api_key)
    
        with open(PATH_TO_FILE, 'rb') as audio:
        source = {'buffer': audio, 'mimetype': file_mime_type}
        # settings options to include puncuation phonecall model and enhanced tier
        options = {'punctuate': True, 'language': 'en', 'model': 'phonecall', 'tier': 'enhanced'}
        
        response = await dg_client.transcription.prerecorded(source, options)
        transcript = response["results"]["channels"][0]["alternatives"][0]["transcript"]
        if transcript:
            print(transcript)
        else:
            print("No Transcription")
            
asyncio.run(main())

Example of a recent call:
Dispatch Audio: https://bcfirewire.com/audio/Litchfield_26_2022_07_29_13_23_26.mp3
Transcription: "Apartment twenty six. Curt Valley m Mass and Rescue Lakeville Township near the intersection of Riverside Drive in Park Hollow Road for working vehicle fire."

Its not perfect, but if you can get the type of call almost 95% of the time. I have deciphered a way to get Road name and call type from these transcriptions.

DC31 · Jul 30, 2022

Interesting @xicarusx. It really does quite well. It has trouble with road names and town names but otherwise is fairly accurate.

DC31 · Jul 30, 2022

How are you deciphering?

xicarusx · Aug 1, 2022

DC31 said:
How are you deciphering?

Looking for keywords and based on how the dispatches and street names work in my area.

example:

Python:

# Grab load transcript from json array to variable and make it lowercase.
transcript = response["results"]["channels"][0]["alternatives"][0]["transcript"].lower()

# create a variable to hold the action we find.
action = ""

if "tree down" in transcript or "trees down" in transcript:
    action = "Tree down"
elif "alarm activation" in transcript or "fire alarm" in transcript or "brush fire" in transcript or "structure fire" in transcript or "vehicle fire" in transcript or "unnkown type of fire" in transcript:
    if "fire alarm" in transcript or "alarm activation" in transcript:
        action = "Automatic Fire Alarm"
    elif "brush fire" in transcript:
        action = "Brush Fire"
    elif "vehicle fire" in transcript:
        action = "Vehicle Fire"
    elif "structure fire" in transcript:
        action = "Structure Fire"
    else:
        action = "Unknown type of fire"
else:
    action = "Unknown"
   
# find road, street, avenue, parkway, way, drive, lane, route, and get info from it
if "route" in transcript:
    pre = transcript.split("route")[0]
    post = transcript.split("route")[1]
    address = post.split(".")
    address = address[0].lstrip()
    address_split = address.split()
    # Since route is usually called by number "County Route 64" change any text numbers to integers
    address = text2int(address)
   
elif "street" in transcript:
    if "street," in transcript:
        transcript = transcript.replace("street,", "street.")
    pre = transcript.split("street")[0]
    post = transcript.split("street")[1]
    # Check if the second word from last is a direction "North Main Street"
    if "north" in pre.split()[-2]:
        address = "N." + " " + pre.split()[-1].lstrip().replace("  ", " ").capitalize()
    elif "south" in pre.split()[-2]:
        address = "S." + " " + pre.split()[-1].lstrip().replace("  ", " ").capitalize()
    elif "east" in pre.split()[-2]:
        address = "E." + " " + pre.split()[-1].lstrip().replace("  ", " ").capitalize()
    elif "west" in pre.split()[-2]:
        address = "W." + " " + pre.split()[-1].lstrip().replace("  ", " ").capitalize()
    # Check if the third word from last is a direction "North Hammond Dam Street"
    elif "north" in pre.split()[-3]:
        address = "N." + " " + pre.split()[-2].lstrip().replace("  ", " ").capitalize()
    elif "south" in pre.split()[-3]:
        address = "S." + " " + pre.split()[-2].lstrip().replace("  ", " ").capitalize()
    elif "east" in pre.split()[-3]:
        address = "E." + " " + pre.split()[-2].lstrip().replace("  ", " ").capitalize()
    elif "west" in pre.split()[-3]:
        address = "W." + " " + pre.split()[-2].lstrip().replace("  ", " ").capitalize()
    else:
        address = pre.split()[-1].lstrip().replace("  ", " ").capitalize()
else:
    address = "Unknown"

print(address + " | " + action)

The text to int code:

Python:

def is_number(x):
    if type(x) == str:
        x = x.replace(',', '')
    try:
        float(x)
    except:
        return False
    return True


def text2int(textnum, numwords={}):
    units = [
        'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight',
        'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen',
        'sixteen', 'seventeen', 'eighteen', 'nineteen',
    ]
    tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety']
    scales = ['hundred', 'thousand', 'million', 'billion', 'trillion']
    ordinal_words = {'first': 1, 'second': 2, 'third': 3, 'fifth': 5, 'eighth': 8, 'ninth': 9, 'twelfth': 12}
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    if not numwords:
        numwords['and'] = (1, 0)
        for idx, word in enumerate(units): numwords[word] = (1, idx)
        for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    textnum = textnum.replace('-', ' ')

    current = result = 0
    curstring = ''
    onnumber = False
    lastunit = False
    lastscale = False

    def is_numword(x):
        if is_number(x):
            return True
        if word in numwords:
            return True
        return False

    def from_numword(x):
        if is_number(x):
            scale = 0
            increment = int(x.replace(',', ''))
            return scale, increment
        return numwords[x]

    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0
            onnumber = True
            lastunit = False
            lastscale = False
        else:
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if (not is_numword(word)) or (word == 'and' and not lastscale):
                if onnumber:
                    # Flush the current number we are building
                    curstring += repr(result + current) + " "
                curstring += word + " "
                result = current = 0
                onnumber = False
                lastunit = False
                lastscale = False
            else:
                scale, increment = from_numword(word)
                onnumber = True

                if lastunit and (word not in scales):
                    # Assume this is part of a string of individual numbers to
                    # be flushed, such as a zipcode "one two three four five"
                    curstring += repr(result + current)
                    result = current = 0

                if scale > 1:
                    current = max(1, current)

                current = current * scale + increment
                if scale > 100:
                    result += current
                    current = 0

                lastscale = False
                lastunit = False
                if word in scales:
                    lastscale = True
                elif word in units:
                    lastunit = True

    if onnumber:
        curstring += repr(result + current)

    return curstring

cpg178 · Feb 1, 2023

Does Deepgram learn over time if you continuously threw audio files at it, and then had a way to give it the correct transcription?

nosoup4u · Feb 1, 2023

I got some funky results also, but it is still much better than my previous attempts trying to decode the audio.

freshfocus1013 · May 25, 2023

I am in search of the proper methods, processes, technologies and group to invest resources and commit myself to until a highly accurate and overall satisfactory transcription of emergency personnel radio communications has been developed to where it can be easily implemented so that we can have access to written files.
Wonderful job on using AI and API . I was grateful to see that you were using the tech before it became so widely popular as it has been in the last few weeks.
Although I am aware of. By technology being older than a few weeks old. I am not well versed in the technology. And just today, I started to. My own API code. On. A different. Project that I'm working on.
Which brings me to my question and i apologize for being so long winded.

In the conversations above, someone had asked , How are you deciphering?"
In which you replied that you are using the street names or something.
What are you having to decipher?
Has there been any recent updates and or do you have any information or hits ir help that you could throw my way or suggest that I do or dont do or could you point me in the direction of great resources of information that i can study o make this system great!

thank you for your timne
thank you for your time and i

millam · Oct 23, 2023

Just wondering if anyone has used Word "Transcribe" function successfully to convert audio
to text. I used an atis site and it didn't work to good, 10%,. I used a NOAA wx channel and it worked
pretty good. Haven't got it to work using a vb cable out of DSDPlus yet. Just started trying. Setting
up vb cable A out of SDR# into DSDPlus and vb cable B out of DSDPlus and in to Word transcribe
function isn't working yet, maybe I can work it out. Any help would be appreciated.

Mil

Chance · Oct 23, 2023

In my experience so far, P25 Phase II quality is too poor for speech recognition. The best success I have had is with the automatic dispatching of fire calls. I use Microsoft Azure Cognitive Services with a custom trained model. The model consists of 300 wave files with text files transcripts. I also included a vocabulary of all call types and all street names in my city. I then run the output through about a hundred "find and replace" string operations to get an output that is about 95% correct.

millam · Oct 23, 2023

Chance said:
In my experience so far, P25 Phase II quality is too poor for speech recognition. The best success I have had is with the automatic dispatching of fire calls. I use Microsoft Azure Cognitive Services with a custom trained model. The model consists of 300 wave files with text files transcripts. I also included a vocabulary of all call types and all street names in my city. I then run the output through about a hundred "find and replace" string operations to get an output that is about 95% correct.

Wow, that's way above my pay grade. I'm just trying to visualize the audio, hard of hearing, need to much volume for others in the house,
if you know who I mean. Headsets are a nuisance. I said Word transcribe but I meant "Dictate" function.

Mil

Ubbe · Oct 24, 2023

I tried the online Microsoft 365 and dictate in Words, and I have only one VBcable from SDR# and used its DMR plugin and it was more than 95% accurate. It even translated automatically if it where some foreign word used.

It stops dictate as soon as I use another window or if it's silence for 30 sec so not really totally useful for scanner work.
It can go back several words and edit mistakes it have done so it seems to be highly intelligent. If it only could stay permanently enabled. Then I could have it write everything that's being said on a channel and I can read that when I return to my PC. Much easier than listening thru all the voice files.

/Ubbe

millam · Oct 24, 2023

Ubbe said:
I tried the online Microsoft 365 and dictate in Words, and I have only one VBcable from SDR# and used its DMR plugin and it was more than 95% accurate. It even translated automatically if it where some foreign word used.

It stops dictate as soon as I use another window or if it's silence for 30 sec so not really totally useful for scanner work.
It can go back several words and edit mistakes it have done so it seems to be highly intelligent. If it only could stay permanently enabled. Then I could have it write everything that's being said on a channel and I can read that when I return to my PC. Much easier than listening thru all the voice files.

/Ubbe

That was my conclusion also. Thanks,
Mil

Ubbe · Oct 24, 2023

I was listening to some mil traffic at 267MHz FM as they are training Ukraine pilots to fly JAS Gripen they will be getting soon, and as other scanners where interfering I couldn't really hear what they said but Word printed out 100% "Going supersonic mach 2 in 30 seconds"

/Ubbe

Audio file to text?

snoack321

Member

a417

Active Member

snoack321

Member

KD0TAZ

Member

xicarusx

Member

racingfan360

Member

xicarusx

Member

DC31

Member

DC31

Member

xicarusx

Member

cpg178

Member

nosoup4u

Member

freshfocus1013

Newbie

millam

Old Radio Guy

Chance

Member

millam

Old Radio Guy

Ubbe

Member

millam

Old Radio Guy

Ubbe

Member

Similar threads