Transcription
Generate accurate speech-to-text transcripts using on-device MLX Whisper, with speaker diarization, noise handling, and export to SRT, VTT, and TXT formats.
Transcription#
FrameCounsel uses Apple's MLX framework to run OpenAI's Whisper model entirely on your Mac. Transcription is fully local, meaning no audio data ever leaves your device. This is critical for maintaining attorney-client privilege and complying with evidence handling requirements.
MLX Whisper Setup#
The Whisper model is bundled with FrameCounsel and optimized for your Apple Silicon hardware during first launch. FrameCounsel ships with two model variants:
- Standard (whisper-large-v3) - Best accuracy for clear audio. Processes at approximately 10x real-time on M2 Pro.
- Turbo (whisper-large-v3-turbo) - Faster processing with slightly reduced accuracy. Ideal for initial review of lengthy recordings.
Select your preferred model in Settings > Transcription > Model. You can switch models between transcription runs without losing previous results.
First-Time Model Optimization
On first use, FrameCounsel converts the Whisper model weights to MLX format optimized for your specific chip. This takes 2-5 minutes on M1 and under a minute on M2 Pro or later. Subsequent transcriptions start immediately.
Speaker Diarization#
FrameCounsel automatically identifies and labels distinct speakers in the audio. Each speaker is assigned a color-coded label (e.g., Speaker A, Speaker B). You can rename speakers to real names by right-clicking any speaker label in the transcript panel.
Diarization is especially valuable in body camera footage where multiple officers, suspects, and bystanders may be speaking simultaneously. The algorithm handles overlapping speech and distinguishes speakers even when they have similar vocal characteristics.
Handling Noisy Audio#
Body camera audio is rarely clean. FrameCounsel includes preprocessing filters specifically tuned for law enforcement recording conditions:
- Wind noise suppression for outdoor encounters
- Radio chatter isolation to separate dispatch audio from scene audio
- Siren and alarm filtering to recover speech beneath loud tonal noise
- Crosstalk separation when multiple body cameras capture the same scene
Enable audio preprocessing in the transcription settings or press before starting transcription.
Editing Transcripts#
After transcription, review and correct the output directly in the Transcript Panel. Click any word to edit it in place. Corrections are tracked as manual overrides and the original machine transcription is preserved for transparency.
Confidence Highlighting
Words with low transcription confidence are highlighted in amber. Focus your review on these words first to maximize accuracy with minimal effort.
Export Formats#
Export your finalized transcript in multiple formats:
| Format | Use Case |
|---|---|
| SRT | Subtitles for video playback in court presentation |
| VTT | Web-compatible captions for digital evidence portals |
| TXT | Plain text for inclusion in motions and briefs |
| Formatted transcript with speaker labels and timestamps |
Export from File > Export Transcript or press . All exports include a header with the source file hash and transcription parameters for evidentiary integrity.
Next Steps#
With your transcript ready, use Report Comparison to cross-reference the spoken record against police narrative reports.