AI Intelligence

On-Device Video Transcription

Every word, captured precisely

FrameCounsel uses MLX Whisper — Apple's optimized implementation of OpenAI's Whisper model — running entirely on your Mac's Neural Engine to transcribe body camera audio, dashcam recordings, surveillance footage, and witness interviews. No audio ever leaves your machine. Timestamps are synchronized to the video frame level, making every spoken word searchable, citeable, and linkable to the exact moment it was said.

Try It Free See All Features

How It Works

From Evidence to Insight

A streamlined workflow designed for defense attorneys, not forensic engineers.

Import Video Evidence

Drag body camera footage, dashcam video, or any evidence recording into FrameCounsel. Supports MP4, MOV, AVI, MKV, and all common video formats.

MLX Whisper Processing

The MLX-optimized Whisper large-v3 model runs on your Apple Silicon Neural Engine, transcribing audio with word-level timestamps. Processing speed approaches real-time on M2 Pro and faster chips.

Speaker Diarization

Automatic speaker separation identifies distinct voices in the recording, labeling them as Speaker 1, Speaker 2, etc. You can assign names (Officer, Defendant, Witness) for clarity.

Searchable, Linked Transcript

The final transcript is fully searchable, with every word linked to its exact video frame. Click any line to jump to that moment. Export as timestamped text, SRT subtitles, or embedded in court reports.

Key Capabilities

What You Can Do

Purpose-built capabilities for criminal defense evidence analysis.

Apple Silicon Native

MLX Whisper is compiled specifically for Apple's Neural Engine, achieving near real-time transcription on M2 Pro and above without any cloud processing.

Zero Data Exposure

Audio never leaves your Mac. No API calls, no cloud uploads, no third-party processing. Privileged case recordings stay under your physical control.

Speaker Diarization

Automatic detection and labeling of multiple speakers. Distinguish between officers, defendants, witnesses, and bystanders in multi-party recordings.

Frame-Accurate Timestamps

Every word is timestamped to the video frame. Search for any phrase and jump directly to the exact moment it was spoken.

Batch Processing

Queue dozens of video files for overnight transcription. Process an entire case's worth of body camera footage while you sleep.

Multiple Export Formats

Export transcripts as timestamped text, SRT subtitle files, Word documents, or embedded directly in FrameCounsel's court-ready reports.

Real-World Scenarios

Defense Attorneys In Action

How defense teams use this capability to protect their clients' rights.

Body Camera Audio Recovery

Scenario

A public defender receives 14 hours of body camera footage across 6 officers for a single incident. Manual transcription would cost thousands and take weeks.

Outcome

FrameCounsel transcribes all 14 hours overnight on the defender's MacBook Pro. By morning, every word is searchable. A keyword search for "weapon" reveals an officer stating "I don't see a weapon" — a critical admission absent from the arrest report.

Witness Statement Verification

Scenario

A witness recorded a video statement on their phone. The prosecution's summary of the statement differs from what the witness actually said.

Outcome

FrameCounsel's frame-accurate transcript of the original recording proves the prosecution's characterization was inaccurate. The defense presents the timestamped transcript alongside the video in a motion to exclude the summary.

Multi-Language Evidence

Scenario

Body camera footage captures a conversation partially in Spanish. The officer's report only documents the English portions, omitting exculpatory statements made in Spanish.

Outcome

Whisper's multilingual capability transcribes both English and Spanish speech. The defense identifies critical exculpatory statements the report omitted, filed as Brady material.

Under the Hood

Technical Details

MLX Whisper on Apple Silicon

Uses OpenAI Whisper large-v3 model optimized for Apple's MLX framework

Runs on Neural Engine (M1/M2/M3/M4 series) with 16-core or higher for optimal performance

Near real-time processing speed on M2 Pro and above (1 hour of audio in ~60 minutes on M1)

Word-level timestamp accuracy within 20-50ms of actual speech onset

Supports 99+ languages with automatic language detection

Speaker diarization uses spectral clustering for voice separation

Memory-efficient processing handles multi-hour recordings without running out of RAM

All model weights stored locally — works fully offline in air-gapped mode

FAQ

Frequently Asked Questions

Common questions about on-device video transcription.

Processing speed depends on your Apple Silicon chip. On M2 Pro and above, transcription approaches real-time speed (1 hour of audio takes roughly 1 hour to process). On base M1, expect roughly 2x real-time. You can queue multiple files for batch processing overnight.

Whisper large-v3 is one of the most robust speech recognition models available, trained on 680,000 hours of diverse audio including noisy environments. It handles wind noise, traffic, radio chatter, and overlapping voices well. For extremely degraded audio, FrameCounsel provides confidence scores per segment so you know which portions may need manual verification.

Yes. FrameCounsel provides an integrated transcript editor where you can correct any errors, assign speaker names, and add annotations. Edits are tracked separately from the original AI-generated transcript so the original is always preserved for integrity.

If a video file contains no audio track, FrameCounsel will detect this and skip transcription for that file. It will still process the video for visual analysis features like object tracking, face recognition, and timeline building.

Related Content

Learn More & Go Deeper

Blog posts, case studies, and documentation related to this feature.

Related Content

From the Blog

Technology

How MLX Whisper Transforms Body Camera Transcription

Read article

Technology

The Complete Guide to Air-Gapped Forensic Video Analysis

Read article

Case Studies

Case Study

Missing Miranda: AI Transcription Reveals Rights Never Read

Read case study

Documentation

Docs

Transcription

View docs

Ready to Use On-Device Video Transcription?

Download FrameCounsel and start using on-device video transcription on your next case. 30-day free trial. No credit card. 100% on-device.

Try It Free See All Features

100% on-device. Zero cloud. Your data never leaves your custody.

All Features

AI Intelligence

On-Device Video Transcription

Every word, captured precisely

Try It Free See All Features

How It Works

From Evidence to Insight

A streamlined workflow designed for defense attorneys, not forensic engineers.

Import Video Evidence

Drag body camera footage, dashcam video, or any evidence recording into FrameCounsel. Supports MP4, MOV, AVI, MKV, and all common video formats.

MLX Whisper Processing

The MLX-optimized Whisper large-v3 model runs on your Apple Silicon Neural Engine, transcribing audio with word-level timestamps. Processing speed approaches real-time on M2 Pro and faster chips.

Speaker Diarization

Automatic speaker separation identifies distinct voices in the recording, labeling them as Speaker 1, Speaker 2, etc. You can assign names (Officer, Defendant, Witness) for clarity.

Searchable, Linked Transcript

Key Capabilities

What You Can Do

Purpose-built capabilities for criminal defense evidence analysis.

Apple Silicon Native

MLX Whisper is compiled specifically for Apple's Neural Engine, achieving near real-time transcription on M2 Pro and above without any cloud processing.

Zero Data Exposure

Audio never leaves your Mac. No API calls, no cloud uploads, no third-party processing. Privileged case recordings stay under your physical control.

Speaker Diarization

Automatic detection and labeling of multiple speakers. Distinguish between officers, defendants, witnesses, and bystanders in multi-party recordings.

Frame-Accurate Timestamps

Every word is timestamped to the video frame. Search for any phrase and jump directly to the exact moment it was spoken.

Batch Processing

Queue dozens of video files for overnight transcription. Process an entire case's worth of body camera footage while you sleep.

Multiple Export Formats

Export transcripts as timestamped text, SRT subtitle files, Word documents, or embedded directly in FrameCounsel's court-ready reports.

Real-World Scenarios

Defense Attorneys In Action

How defense teams use this capability to protect their clients' rights.

Body Camera Audio Recovery

Scenario

A public defender receives 14 hours of body camera footage across 6 officers for a single incident. Manual transcription would cost thousands and take weeks.

Outcome

Witness Statement Verification

Scenario

A witness recorded a video statement on their phone. The prosecution's summary of the statement differs from what the witness actually said.

Outcome

Multi-Language Evidence

Scenario

Body camera footage captures a conversation partially in Spanish. The officer's report only documents the English portions, omitting exculpatory statements made in Spanish.

Outcome

Whisper's multilingual capability transcribes both English and Spanish speech. The defense identifies critical exculpatory statements the report omitted, filed as Brady material.

Under the Hood