Working on a project: Scanscribe | AI powered speech-to-text transcriber built for scanners.

ScanScribe · Jul 9, 2025

Hey all! I recently been working on a not so little project called Scanscribe. An AI-powered speech-to-text transcriber built specifically for radio and written in python.

How it works: Scanscribe monitors an "ingest" folder (like how rdio-scanner does) which where is you point your recorders like SDRTrunk or Proscan to save audio to. Then Scanscribe will use a whisper text to speech ai model fine-tuned (MANUALLY BY ME) from currently 600 different radio audio clips to accurately transcribe your audio files to a text file.

Requirements:
Basic python knowledge. (to know how to run the script).
An nvidia GPU or a heavy duty CPU.

Would anyone be interested in further development or beta testing?

Enforcer52 · Jul 9, 2025

I would be interested in testing it out.

Does it do real time transcribing of each transmission, or only on demand for a particular file, or both options?

pb_lonny · Jul 9, 2025

This sounds awesome. I have some experience with Python.

ScanScribe · Jul 9, 2025

Enforcer52 said:
I would be interested in testing it out.

Does it do real time transcribing of each transmission, or only on demand for a particular file, or both options?

as soon as your recorder is done with it, the clip gets sent down the pipeline to be transcribed. It can also process an entire folder of audio clips until its caught up, then it will proceed with processing live transcripts. DM me for details if your interested but keep in mind that this is experimental! It might not work as well for other people due to the fine-tuned model being based from my area's radios. I am curious to see though how it performs!

sloan · Jul 9, 2025

xxbubziexx said:
as soon as your recorder is done with it, the clip gets sent down the pipeline to be transcribed. It can also process an entire folder of audio clips until its caught up, then it will proceed with processing live transcripts. DM me for details if your interested but keep in mind that this is experimental! It might not work as well for other people due to the fine-tuned model being based from my area's radios. I am curious to see though how it performs!

So crazy. I just asked ChatGPT if this was already a thing and it linked me to this thread that was posted hours before. I’m super interested!

ScanScribe · Aug 8, 2025

Update on ScanScribe.

================================================================================
SCANSCRIBE FEATURES & FUNCTIONALITY SUMMARY
================================================================================

Version: Current BETA
Date: 2025-01-07
================================================================================

🎯 CORE TRANSCRIPTION ENGINE
================================================================================

• Real-time Audio Monitoring
- Automatic file detection and processing (Point recorders to the 'ingest' folder)
- Multi-threaded processing (1-16 configurable workers)
- GPU acceleration with CUDA support
- File stability checking to prevent partial processing

• Voice Activity Detection (VAD)
- Adjustable sensitivity thresholds (0.0-1.0)
- Configurable VAD usage
- Audio preprocessing optimization

• Quality Assessment
- Confidence scoring for transcription quality
- Color-coded confidence levels (Green/Yellow/Red)
- Confidence filtering options
- Quality-based file rejection

• File Processing
- Automatic file cleanup option
- File rejection filters (size/duration based)
- Metadata extraction from MP3 files
- Multiple audio format support

================================================================================
🎨 MODERN GUI INTERFACE
================================================================================

• Visual Design
- Dark/Light theme support
- Customizable appearance settings
- Split-pane layout (console + history)
- Responsive design with resizable panels

• Interactive Features
- Auto-scroll with manual override
- Real-time counters and statistics
- Interactive transcription entries
- Audio replay functionality
- Settings dialog with comprehensive options

• Display Options
- Configurable history limits (1-1000 entries)
- Time format options (12/24-hour)
- File metadata display
- Archive management interface

================================================================================
📊 ADVANCED DISPLAY & MONITORING
================================================================================

• Real-time Statistics
- Files remaining counter
- Total processed counter
- Archive file counter
- Processing status indicators

• Quality Control
- Confidence-based filtering
- Quality indicators with visual feedback
- Transcription accuracy assessment
- Error logging and debugging

• Archive Management
- File counting and organization
- Bulk deletion capabilities
- Archive clearing functions
- File retention policies

================================================================================
🔧 CONFIGURATION & SETTINGS
================================================================================

• Model Management
- Whisper model selection (Currently using custom fine-tuned whisper model trained on over 2000 radio transmissions for more accurate transcriptions. The higher the number the better)
- Custom model integration
- Model comparison tools
- Performance optimization settings

• API Integration
- Gemini API key management
- Hugging Face token authentication
- External service connectivity
- Secure credential storage

• Processing Options
- VAD configuration
- File rejection settings
- Cleanup preferences
- Worker thread management

• Display Preferences
- Theme selection
- Time format configuration
- History limits
- Auto-scroll settings

================================================================================
📝 REVIEW & QUALITY CONTROL TOOLS
================================================================================

• Review Tool
- Manual transcription review interface
- Audio playback controls
- Batch processing capabilities
- File organization (processed/deleted/skipped)
- Model comparison with multiple variants
- CSV export functionality

• WER (Word Error Rate) Measuring Tool
- Accuracy assessment of transcription models
- Performance comparison between models
- Statistical analysis of transcription quality
- Metadata management for evaluation datasets

• Quality Assessment
- Confidence scoring algorithms
- Quality-based filtering
- Error detection and reporting
- Performance metrics tracking

================================================================================
🎓 FINE-TUNING CAPABILITIES
================================================================================

• Custom Model Training
- Domain-specific audio training
- Dataset preparation tools
- Training configuration management
- Model optimization techniques

• Training Configuration
- Epoch management (default: 6)
- Batch size optimization (default: 2)
- Gradient accumulation steps (default: 2)
- Learning rate adjustment (default: 1e-5)
- Warmup ratio settings (default: 0.1)
- Mixed precision training (FP16)

• Performance Optimization
- GPU memory management
- Gradient checkpointing
- Memory-efficient attention
- Training progress monitoring

================================================================================
🔄 FILE MANAGEMENT & PROCESSING
================================================================================

• File Operations
- Automatic file detection
- File stability verification
- Metadata extraction
- Format conversion support
- Error handling and recovery

• Archive System
- Processed file storage
- File organization
- Archive management tools
- Bulk operations support

• Processing Pipeline
- Multi-stage audio processing
- Quality control integration
- Error recovery mechanisms
- Performance optimization

================================================================================
📈 PERFORMANCE & MONITORING
================================================================================

• System Monitoring
- GPU memory usage tracking
- Processing statistics
- Performance metrics
- Resource utilization

• Logging & Debugging
- Comprehensive log generation
- Error tracking and reporting
- Debug information output
- Performance analysis tools

• Optimization Features
- GPU memory optimization
- Processing efficiency improvements
- Resource management
- Performance tuning options

================================================================================
🔐 SECURITY & INTEGRATION
================================================================================

• Authentication
- API key management
- Hugging Face integration
- Secure credential storage
- Token-based authentication

• Cross-Platform Support
- Windows compatibility
- Linux support
- macOS compatibility
- Platform-specific optimizations

• Data Security
- Secure configuration storage
- Encrypted credential handling
- Privacy protection measures
- Data integrity verification

================================================================================
🎯 USE CASES & APPLICATIONS
================================================================================

• Radio Communications
- Real-time transcription
- Communication monitoring
- Quality assessment
- Archive management

• Public Safety
- Emergency communications
- Audio processing
- Quality control
- Documentation

• Quality Assurance
- Transcription accuracy
- Model evaluation
- Performance assessment
- Continuous improvement

• Research & Development
- Model fine-tuning
- Dataset preparation
- Performance analysis
- Custom model development

================================================================================
🛠 TECHNICAL ARCHITECTURE
================================================================================

• Framework & Libraries
- PySide6 GUI framework
- PyTorch for ML operations
- Transformers for model handling
- Librosa for audio processing

• Design Patterns
- Multi-threading architecture
- Signal-based communication
- Modular component design
- Event-driven processing

• Configuration Management
- INI file configuration
- Dynamic settings updates
- Persistent storage
- Environment-specific settings

• Error Handling
- Graceful degradation
- Error recovery mechanisms
- Comprehensive logging
- User-friendly error messages

================================================================================
📋 SYSTEM REQUIREMENTS
================================================================================

• Hardware
- CUDA-compatible GPU (recommended)
- Minimum 8GB RAM
- Sufficient storage for audio files
- Audio input/output capabilities

• Software
- Python 3.8+
- PySide6
- PyTorch with CUDA support
- Audio processing libraries

• Dependencies
- Transformers library
- Librosa for audio
- Mutagen for metadata
- NumPy for calculations

================================================================================
🚀 FUTURE ENHANCEMENTS
================================================================================

• Planned Features
- Additional model support
- Enhanced GUI capabilities
- Improved performance optimization
- Extended file format support

• Development Roadmap
- Advanced quality metrics
- Real-time collaboration tools
- Cloud integration options
- Mobile application support

================================================================================

This comprehensive system provides end-to-end audio transcription capabilities
with advanced quality control, review tools, and fine-tuning capabilities for
specialized use cases in radio communications and public safety applications.

================================================================================

merlin · Aug 10, 2025

A clever notion, I have thought of that back a while, I have a hardware transcriber that works OK, but a good clear voice helps.
Look into RUST if you want to develop something along these lines, supports Linux and Windows platforms.

Vixus · Aug 16, 2025

ScanScribe said:
Update on ScanScribe.

Thanks for the update!
What a wonderful and ambitious project! I've auditory processing issues and am not a native English speaker, so I've been looking into a solution like this myself to help supplement the 'noisy signal' I hear with something I can read.

I'm currently running a somewhat jank split audio signal setup that is routed through a Python project named Radiodictator - It's using Whisper which monitors a virtual audio input. It is OK, but incredibly lackluster unless following the conversation live while looking at raw console output.

If you'd like to have someone assist with testing, you're welcome to shoot me a DM.
Win/Linux - RTX 3090Ti -- SDS200E + 2X RTL-SDR

ScanScribe · Aug 20, 2025

I have a release ready for testing for anyone interested. DM me.

I8brwork · Sep 17, 2025

ScanScribe said:
I have a release ready for testing for anyone interested. DM me.

Interested to try this out

Chinook75 · Sep 22, 2025

I am interested to try this out as well. Using rtl-sdr v3/4 and airspy mini. Monitoring civ and mil airband

sidthekid1998 · Sep 22, 2025

Hi I’d be interested to try it out as well

brickhouse554 · Sep 25, 2025

I'd be interested as well

ScanScribe · Sep 25, 2025

Cancelling the project. All the people that were interested ghosted me after ive given them access. Some even claimed my work is a virus even with proof that it isnt. Guess this will be a personal program.

sidthekid1998 · Sep 25, 2025

Nah man that’s the next gen !!!

Trilliumaire · Tuesday at 8:16 PM

Maybe just open source it. I am am AI developer, and I also moderate the OpenAI Developer Forum. Whisper is a great model to run locally, and is still relavent. Besides transcription, you can mean pool the hidden layers to create embedding vectors, which is useful in clustering and classification.

Working on a project: Scanscribe | AI powered speech-to-text transcriber built for scanners.

ScanScribe

Member

Enforcer52

Broadcastify, Calls Platform, Public Playlist

pb_lonny

VK7AAL

ScanScribe

Member

sloan

Member

ScanScribe

Member

merlin

Active Member

Vixus

Newbie

ScanScribe

Member

I8brwork

Newbie

Chinook75

Newbie

sidthekid1998

Member

brickhouse554

Newbie

ScanScribe

Member

sidthekid1998

Member

Trilliumaire

Member

Similar threads