Every word, captured precisely
FrameCounsel uses MLX Whisper — Apple's optimized implementation of OpenAI's Whisper model — running entirely on your Mac's Neural Engine to transcribe body camera audio, dashcam recordings, surveillance footage, and witness interviews. No audio ever leaves your machine. Timestamps are synchronized to the video frame level, making every spoken word searchable, citeable, and linkable to the exact moment it was said.
A streamlined workflow designed for defense attorneys, not forensic engineers.
Drag body camera footage, dashcam video, or any evidence recording into FrameCounsel. Supports MP4, MOV, AVI, MKV, and all common video formats.
The MLX-optimized Whisper large-v3 model runs on your Apple Silicon Neural Engine, transcribing audio with word-level timestamps. Processing speed approaches real-time on M2 Pro and faster chips.
Automatic speaker separation identifies distinct voices in the recording, labeling them as Speaker 1, Speaker 2, etc. You can assign names (Officer, Defendant, Witness) for clarity.
The final transcript is fully searchable, with every word linked to its exact video frame. Click any line to jump to that moment. Export as timestamped text, SRT subtitles, or embedded in court reports.
Purpose-built capabilities for criminal defense evidence analysis.
MLX Whisper is compiled specifically for Apple's Neural Engine, achieving near real-time transcription on M2 Pro and above without any cloud processing.
Audio never leaves your Mac. No API calls, no cloud uploads, no third-party processing. Privileged case recordings stay under your physical control.
Automatic detection and labeling of multiple speakers. Distinguish between officers, defendants, witnesses, and bystanders in multi-party recordings.
Every word is timestamped to the video frame. Search for any phrase and jump directly to the exact moment it was spoken.
Queue dozens of video files for overnight transcription. Process an entire case's worth of body camera footage while you sleep.
Export transcripts as timestamped text, SRT subtitle files, Word documents, or embedded directly in FrameCounsel's court-ready reports.
How defense teams use this capability to protect their clients' rights.
Scenario
A public defender receives 14 hours of body camera footage across 6 officers for a single incident. Manual transcription would cost thousands and take weeks.
Outcome
FrameCounsel transcribes all 14 hours overnight on the defender's MacBook Pro. By morning, every word is searchable. A keyword search for "weapon" reveals an officer stating "I don't see a weapon" — a critical admission absent from the arrest report.
Scenario
A witness recorded a video statement on their phone. The prosecution's summary of the statement differs from what the witness actually said.
Outcome
FrameCounsel's frame-accurate transcript of the original recording proves the prosecution's characterization was inaccurate. The defense presents the timestamped transcript alongside the video in a motion to exclude the summary.
Scenario
Body camera footage captures a conversation partially in Spanish. The officer's report only documents the English portions, omitting exculpatory statements made in Spanish.
Outcome
Whisper's multilingual capability transcribes both English and Spanish speech. The defense identifies critical exculpatory statements the report omitted, filed as Brady material.
MLX Whisper on Apple Silicon
Uses OpenAI Whisper large-v3 model optimized for Apple's MLX framework
Runs on Neural Engine (M1/M2/M3/M4 series) with 16-core or higher for optimal performance
Near real-time processing speed on M2 Pro and above (1 hour of audio in ~60 minutes on M1)
Word-level timestamp accuracy within 20-50ms of actual speech onset
Supports 99+ languages with automatic language detection
Speaker diarization uses spectral clustering for voice separation
Memory-efficient processing handles multi-hour recordings without running out of RAM
All model weights stored locally — works fully offline in air-gapped mode
Common questions about on-device video transcription.
Processing speed depends on your Apple Silicon chip. On M2 Pro and above, transcription approaches real-time speed (1 hour of audio takes roughly 1 hour to process). On base M1, expect roughly 2x real-time. You can queue multiple files for batch processing overnight.
Whisper large-v3 is one of the most robust speech recognition models available, trained on 680,000 hours of diverse audio including noisy environments. It handles wind noise, traffic, radio chatter, and overlapping voices well. For extremely degraded audio, FrameCounsel provides confidence scores per segment so you know which portions may need manual verification.
Yes. FrameCounsel provides an integrated transcript editor where you can correct any errors, assign speaker names, and add annotations. Edits are tracked separately from the original AI-generated transcript so the original is always preserved for integrity.
If a video file contains no audio track, FrameCounsel will detect this and skip transcription for that file. It will still process the video for visual analysis features like object tracking, face recognition, and timeline building.
Blog posts, case studies, and documentation related to this feature.
Download FrameCounsel and start using on-device video transcription on your next case. 30-day free trial. No credit card. 100% on-device.