Tom_Neverwinter
Member
While radio software capably converts radio waves to audio, the critical advantage lies in transforming that audio into immediately scannable text. Imagine receiving an alert from your home system, capturing vital over-the-air communications even while you're away—instantly knowing if a concerning event, like an incident on your block or within your residence, is unfolding. This empowers you to swiftly discern actionable intelligence from background noise, enabling rapid response in urgent situations. Unlike other solutions such as Whisper, which can be resource-intensive and slower for real-time applications, Parakeet TDT 0.6 provides a robust and efficient alternative. It's engineered for both high-speed, faster-than-real-time transcription and efficient batch processing, all while requiring fewer resources, making it accessible even for older or less powerful systems, ensuring no critical detail is missed in time-sensitive scenarios.
I have been outputting dsd+ to an input folder which is then parsed by parakeet and output in an output folder. making a subtitle srt file which can be checked by other software for keywords or context. the word error rate isnt fantastic [6 words per 100] but its about on par with humans so??? why not.
I use this model:
hopefully something better comes along like this but with the timestamp ability and translation.
nvidia/parakeet-rnnt-1.1b · Hugging Face is Multilanguage model but slower and sadly doesn't translate it just outputs the language in text.
I installed the nemo framework like this:
Windows install
install python 3.11
then I made a bat file and pythong file to run the thing and perform chunking inside the main folder
I made a python file to run my model and chunk the information taking it from input folder processing it then outputting it:
"long_batch_transcribe_chunked.py"
to run the software I just schedule windows to run the bat file every few minutes.
I hope this helps people and makes a better world, sorry if my tutorial for how this works is a little sloppy. please feel free to make improvements
I have been outputting dsd+ to an input folder which is then parsed by parakeet and output in an output folder. making a subtitle srt file which can be checked by other software for keywords or context. the word error rate isnt fantastic [6 words per 100] but its about on par with humans so??? why not.
I use this model:

nvidia/parakeet-tdt-0.6b-v2 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
nvidia/parakeet-rnnt-1.1b · Hugging Face is Multilanguage model but slower and sadly doesn't translate it just outputs the language in text.
I installed the nemo framework like this:
Windows install
install python 3.11
py -3.11 -m venv .venv
.venv\Scripts\activate
pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -U "nemo_toolkit[asr]" soundfile
then I made a bat file and pythong file to run the thing and perform chunking inside the main folder
I made a python file to run my model and chunk the information taking it from input folder processing it then outputting it:
"long_batch_transcribe_chunked.py"
then the bat file to tell it what to do and make it easier to interface with:import os
import shutil
import tempfile
import subprocess
import glob
from nemo.collections.asr.models import ASRModel
# ─── CONFIG ─────────────────────────────────────────────────────────────
INPUT_DIR = "input"
OUTPUT_DIR = "output"
#MODEL_DIR = "models/parakeet-tdt-0.6b-v2"
MODEL_DIR = "models/"
CHECKPOINT = os.path.join(MODEL_DIR, "parakeet-tdt-0.6b-v2.nemo")
#CHECKPOINT = os.path.join(MODEL_DIR, "Parakeet-RNNT-XXL-1.1b_merged_universal_spe8.5k_1.0.nemo")
CHUNK_SECONDS = 60 # chunk length in seconds
# ─── PREPARE DIRS & OFFLINE ─────────────────────────────────────────────
os.makedirs(INPUT_DIR, exist_ok=True)
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.environ["HF_HUB_OFFLINE"] = "1"
os.environ["TRANSFORMERS_OFFLINE"] = "1"
# ─── LOAD & CONFIGURE MODEL FOR STREAMING ───────────────────────────────
print(f"Loading model from {CHECKPOINT}…")
model = ASRModel.restore_from(CHECKPOINT).to("cuda")
model.change_attention_model("rel_pos_local_attn", [128, 128])
model.change_subsampling_conv_chunking_factor(1)
# ─── PROCESS EACH AUDIO IN INPUT_DIR ────────────────────────────────────
for fname in os.listdir(INPUT_DIR):
src = os.path.join(INPUT_DIR, fname)
if not os.path.isfile(src):
continue
print(f"\n⏳ Processing {fname}…")
# 1) use ffmpeg to split into CHUNK_SECONDS WAVs
tmpdir = tempfile.TemporaryDirectory()
pattern = os.path.join(tmpdir.name, "chunk_%06d.wav")
cmd = [
"ffmpeg", "-y", "-i", src,
"-ac", "1", "-ar", "16000",
"-f", "segment", "-segment_time", str(CHUNK_SECONDS),
"-c:a", "pcm_s16le", pattern
]
subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
# 2) transcribe each chunk
transcripts = []
for chunk in sorted(glob.glob(os.path.join(tmpdir.name, "chunk_*.wav"))):
print(f" Transcribing {os.path.basename(chunk)}…")
text = model.transcribe([chunk])[0].text
transcripts.append(text)
# 3) combine & write out the full transcript
full_text = "\n".join(transcripts)
base = os.path.splitext(fname)[0]
out_txt = os.path.join(OUTPUT_DIR, f"{base}.txt")
with open(out_txt, "w", encoding="utf-8") as f:
f.write(full_text)
print(f"▶ Transcript saved to {out_txt}")
# 4) move original audio into OUTPUT_DIR
shutil.move(src, os.path.join(OUTPUT_DIR, fname))
print(f"✔ Moved {fname} → {OUTPUT_DIR}")
# 5) clean up temp chunks
tmpdir.cleanup()
print("\n✅ All files processed.")
pause
@echo off
REM ─── Navigate to script directory ─────────────────────────────────────
cd /d %~dp0
REM ─── Environment & Config ─────────────────────────────────────────────
set HF_HUB_OFFLINE=1
set TRANSFORMERS_OFFLINE=1
REM (Optionally override these here if you want)
set INPUT_DIR=input
set OUTPUT_DIR=output
set MODEL_DIR=models
set CHECKPOINT=%MODEL_DIR%\Parakeet-RNNT-XXL-1.1b_merged_universal_spe8.5k_1.0.nemo
set CHUNK_SECONDS=60
REM ─── Create folders if they don’t exist ───────────────────────────────
if not exist "%INPUT_DIR%" mkdir "%INPUT_DIR%"
if not exist "%OUTPUT_DIR%" mkdir "%OUTPUT_DIR%"
REM ─── Run the Python transcription script ──────────────────────────────
echo.
echo Loading model from %CHECKPOINT% and processing all audio in %INPUT_DIR%…
echo.
python transcribe.py
echo.
echo DONE.
pause
to run the software I just schedule windows to run the bat file every few minutes.
I hope this helps people and makes a better world, sorry if my tutorial for how this works is a little sloppy. please feel free to make improvements