Working on a project: Scanscribe | AI powered speech-to-text transcriber built for scanners.

ScanScribe

Member
Joined
Nov 30, 2017
Messages
84
Reaction score
20
Location
St. Francois county
Hey all! I recently been working on a not so little project called Scanscribe. An AI-powered speech-to-text transcriber built specifically for radio and written in python.

How it works: Scanscribe monitors an "ingest" folder (like how rdio-scanner does) which where is you point your recorders like SDRTrunk or Proscan to save audio to. Then Scanscribe will use a whisper text to speech ai model fine-tuned (MANUALLY BY ME) from currently 600 different radio audio clips to accurately transcribe your audio files to a text file.

Requirements:
Basic python knowledge. (to know how to run the script).
An nvidia GPU or a heavy duty CPU.

Would anyone be interested in further development or beta testing?
 

Enforcer52

Broadcastify, Calls Platform, Public Playlist
Feed Provider
Joined
Sep 4, 2023
Messages
983
Reaction score
849
Location
Lake Livingston, TX
I would be interested in testing it out.

Does it do real time transcribing of each transmission, or only on demand for a particular file, or both options?
 

ScanScribe

Member
Joined
Nov 30, 2017
Messages
84
Reaction score
20
Location
St. Francois county
I would be interested in testing it out.

Does it do real time transcribing of each transmission, or only on demand for a particular file, or both options?
as soon as your recorder is done with it, the clip gets sent down the pipeline to be transcribed. It can also process an entire folder of audio clips until its caught up, then it will proceed with processing live transcripts. DM me for details if your interested but keep in mind that this is experimental! It might not work as well for other people due to the fine-tuned model being based from my area's radios. I am curious to see though how it performs!
 
Last edited:

sloan

Member
Premium Subscriber
Joined
Mar 5, 2008
Messages
7
Reaction score
2
as soon as your recorder is done with it, the clip gets sent down the pipeline to be transcribed. It can also process an entire folder of audio clips until its caught up, then it will proceed with processing live transcripts. DM me for details if your interested but keep in mind that this is experimental! It might not work as well for other people due to the fine-tuned model being based from my area's radios. I am curious to see though how it performs!
So crazy. I just asked ChatGPT if this was already a thing and it linked me to this thread that was posted hours before. I’m super interested!
 

ScanScribe

Member
Joined
Nov 30, 2017
Messages
84
Reaction score
20
Location
St. Francois county
Update on ScanScribe.
Python 3.11 7_27_2025 11_29_10 AM.png
================================================================================
SCANSCRIBE FEATURES & FUNCTIONALITY SUMMARY
================================================================================

Version: Current BETA
Date: 2025-01-07
================================================================================

🎯 CORE TRANSCRIPTION ENGINE
================================================================================

• Real-time Audio Monitoring
- Automatic file detection and processing (Point recorders to the 'ingest' folder)
- Multi-threaded processing (1-16 configurable workers)
- GPU acceleration with CUDA support
- File stability checking to prevent partial processing

• Voice Activity Detection (VAD)
- Adjustable sensitivity thresholds (0.0-1.0)
- Configurable VAD usage
- Audio preprocessing optimization

• Quality Assessment
- Confidence scoring for transcription quality
- Color-coded confidence levels (Green/Yellow/Red)
- Confidence filtering options
- Quality-based file rejection

• File Processing
- Automatic file cleanup option
- File rejection filters (size/duration based)
- Metadata extraction from MP3 files
- Multiple audio format support

================================================================================
🎨 MODERN GUI INTERFACE
================================================================================

• Visual Design
- Dark/Light theme support
- Customizable appearance settings
- Split-pane layout (console + history)
- Responsive design with resizable panels

• Interactive Features
- Auto-scroll with manual override
- Real-time counters and statistics
- Interactive transcription entries
- Audio replay functionality
- Settings dialog with comprehensive options

• Display Options
- Configurable history limits (1-1000 entries)
- Time format options (12/24-hour)
- File metadata display
- Archive management interface

================================================================================
📊 ADVANCED DISPLAY & MONITORING
================================================================================

• Real-time Statistics
- Files remaining counter
- Total processed counter
- Archive file counter
- Processing status indicators

• Quality Control
- Confidence-based filtering
- Quality indicators with visual feedback
- Transcription accuracy assessment
- Error logging and debugging

• Archive Management
- File counting and organization
- Bulk deletion capabilities
- Archive clearing functions
- File retention policies

================================================================================
🔧 CONFIGURATION & SETTINGS
================================================================================

• Model Management
- Whisper model selection (Currently using custom fine-tuned whisper model trained on over 2000 radio transmissions for more accurate transcriptions. The higher the number the better)
- Custom model integration
- Model comparison tools
- Performance optimization settings

• API Integration
- Gemini API key management
- Hugging Face token authentication
- External service connectivity
- Secure credential storage

• Processing Options
- VAD configuration
- File rejection settings
- Cleanup preferences
- Worker thread management

• Display Preferences
- Theme selection
- Time format configuration
- History limits
- Auto-scroll settings

================================================================================
📝 REVIEW & QUALITY CONTROL TOOLS
================================================================================

• Review Tool
- Manual transcription review interface
- Audio playback controls
- Batch processing capabilities
- File organization (processed/deleted/skipped)
- Model comparison with multiple variants
- CSV export functionality

• WER (Word Error Rate) Measuring Tool
- Accuracy assessment of transcription models
- Performance comparison between models
- Statistical analysis of transcription quality
- Metadata management for evaluation datasets

• Quality Assessment
- Confidence scoring algorithms
- Quality-based filtering
- Error detection and reporting
- Performance metrics tracking

================================================================================
🎓 FINE-TUNING CAPABILITIES
================================================================================

• Custom Model Training
- Domain-specific audio training
- Dataset preparation tools
- Training configuration management
- Model optimization techniques

• Training Configuration
- Epoch management (default: 6)
- Batch size optimization (default: 2)
- Gradient accumulation steps (default: 2)
- Learning rate adjustment (default: 1e-5)
- Warmup ratio settings (default: 0.1)
- Mixed precision training (FP16)

• Performance Optimization
- GPU memory management
- Gradient checkpointing
- Memory-efficient attention
- Training progress monitoring

================================================================================
🔄 FILE MANAGEMENT & PROCESSING
================================================================================

• File Operations
- Automatic file detection
- File stability verification
- Metadata extraction
- Format conversion support
- Error handling and recovery

• Archive System
- Processed file storage
- File organization
- Archive management tools
- Bulk operations support

• Processing Pipeline
- Multi-stage audio processing
- Quality control integration
- Error recovery mechanisms
- Performance optimization

================================================================================
📈 PERFORMANCE & MONITORING
================================================================================

• System Monitoring
- GPU memory usage tracking
- Processing statistics
- Performance metrics
- Resource utilization

• Logging & Debugging
- Comprehensive log generation
- Error tracking and reporting
- Debug information output
- Performance analysis tools

• Optimization Features
- GPU memory optimization
- Processing efficiency improvements
- Resource management
- Performance tuning options

================================================================================
🔐 SECURITY & INTEGRATION
================================================================================

• Authentication
- API key management
- Hugging Face integration
- Secure credential storage
- Token-based authentication

• Cross-Platform Support
- Windows compatibility
- Linux support
- macOS compatibility
- Platform-specific optimizations

• Data Security
- Secure configuration storage
- Encrypted credential handling
- Privacy protection measures
- Data integrity verification

================================================================================
🎯 USE CASES & APPLICATIONS
================================================================================

• Radio Communications
- Real-time transcription
- Communication monitoring
- Quality assessment
- Archive management

• Public Safety
- Emergency communications
- Audio processing
- Quality control
- Documentation

• Quality Assurance
- Transcription accuracy
- Model evaluation
- Performance assessment
- Continuous improvement

• Research & Development
- Model fine-tuning
- Dataset preparation
- Performance analysis
- Custom model development

================================================================================
🛠 TECHNICAL ARCHITECTURE
================================================================================

• Framework & Libraries
- PySide6 GUI framework
- PyTorch for ML operations
- Transformers for model handling
- Librosa for audio processing

• Design Patterns
- Multi-threading architecture
- Signal-based communication
- Modular component design
- Event-driven processing

• Configuration Management
- INI file configuration
- Dynamic settings updates
- Persistent storage
- Environment-specific settings

• Error Handling
- Graceful degradation
- Error recovery mechanisms
- Comprehensive logging
- User-friendly error messages

================================================================================
📋 SYSTEM REQUIREMENTS
================================================================================

• Hardware
- CUDA-compatible GPU (recommended)
- Minimum 8GB RAM
- Sufficient storage for audio files
- Audio input/output capabilities

• Software
- Python 3.8+
- PySide6
- PyTorch with CUDA support
- Audio processing libraries

• Dependencies
- Transformers library
- Librosa for audio
- Mutagen for metadata
- NumPy for calculations

================================================================================
🚀 FUTURE ENHANCEMENTS
================================================================================

• Planned Features
- Additional model support
- Enhanced GUI capabilities
- Improved performance optimization
- Extended file format support

• Development Roadmap
- Advanced quality metrics
- Real-time collaboration tools
- Cloud integration options
- Mobile application support

================================================================================

This comprehensive system provides end-to-end audio transcription capabilities
with advanced quality control, review tools, and fine-tuning capabilities for
specialized use cases in radio communications and public safety applications.

================================================================================
 

merlin

Active Member
Joined
Jul 3, 2003
Messages
3,716
Reaction score
1,833
Location
DN32su
A clever notion, I have thought of that back a while, I have a hardware transcriber that works OK, but a good clear voice helps.
Look into RUST if you want to develop something along these lines, supports Linux and Windows platforms.
 

Vixus

Newbie
Premium Subscriber
Joined
Oct 25, 2021
Messages
1
Reaction score
0
Update on ScanScribe.
Thanks for the update!
What a wonderful and ambitious project! I've auditory processing issues and am not a native English speaker, so I've been looking into a solution like this myself to help supplement the 'noisy signal' I hear with something I can read.

I'm currently running a somewhat jank split audio signal setup that is routed through a Python project named Radiodictator - It's using Whisper which monitors a virtual audio input. It is OK, but incredibly lackluster unless following the conversation live while looking at raw console output.

If you'd like to have someone assist with testing, you're welcome to shoot me a DM.
Win/Linux - RTX 3090Ti -- SDS200E + 2X RTL-SDR
:)
 
Last edited:

Chinook75

Newbie
Joined
Oct 21, 2024
Messages
1
Reaction score
0
I am interested to try this out as well. Using rtl-sdr v3/4 and airspy mini. Monitoring civ and mil airband
 

ScanScribe

Member
Joined
Nov 30, 2017
Messages
84
Reaction score
20
Location
St. Francois county
Cancelling the project. All the people that were interested ghosted me after ive given them access. Some even claimed my work is a virus even with proof that it isnt. Guess this will be a personal program.
 

Trilliumaire

Member
Premium Subscriber
Joined
Aug 14, 2025
Messages
46
Reaction score
19
Location
Evergreen, CO
Maybe just open source it. I am am AI developer, and I also moderate the OpenAI Developer Forum. Whisper is a great model to run locally, and is still relavent. Besides transcription, you can mean pool the hidden layers to create embedding vectors, which is useful in clustering and classification.
 
Top