Insanely Fast Whisper slashes transcription latency

// 108d agoOPENSOURCE RELEASE

Insanely Fast Whisper slashes transcription latency

Insanely Fast Whisper is a community-driven, on-device CLI for fast speech-to-text with Whisper, built on Transformers, Optimum, and Flash Attention 2. The repo claims it can transcribe 150 minutes of audio in under 98 seconds on an A100 and works across CUDA GPUs and Apple Silicon via MPS.

// ANALYSIS

This is the rare benchmark-heavy OSS project that actually looks shippable. It packages Whisper optimization into a normal CLI instead of leaving it as a notebook demo.

–It supports file or URL input, word-level timestamps, and optional speaker diarization, so it covers more than raw transcription.
–The speed story is real but hardware-dependent; batch size, model choice, and backend matter a lot, so this is a throughput tool rather than magic.
–Staying inside the Hugging Face stack lowers adoption friction for teams already using Transformers, Optimum, or flash-attn.
–Mac support broadens the audience, but the most aggressive numbers still look like CUDA territory.
–Downstream wrappers and blog coverage suggest it solved a practical pain point, not just a benchmark vanity project.

// TAGS

insanely-fast-whisperspeechcliopen-sourceinferencedevtool

DISCOVERED

108d ago

2026-03-26

PUBLISHED

108d ago

2026-03-26

RELEVANCE

8/ 10

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

VIDEO25m ago

Video revisits pre-launch GPT-5.6, Grok 4.5 rumors

This video provides a retrospective look at the rumors, speculation, and mystery that surrounded OpenAI's GPT-5.6 prior to its official launch in July 2026. The commentary highlights the community's anticipation of GPT-5.6's capabilities—such as its new tiers (Sol, Terra, and Luna) and advanced agentic features—in comparison to other concurrent frontier developments, including xAI's Grok 4.5, a massive 2.7T-parameter open-source model from MiniMax, DeepSeek's AI chip efforts, and Microsoft's Orca world model.

INFRA43m ago

NaN Builders hosts parallel OpenCode agents

NaN Builders is a flat-rate GPU inference platform offering developers persistent, isolated microVM environments. A developer demonstrated the platform by running three parallel OpenCode coding agents using self-hosted models hosted directly on NaN Builders, avoiding token-metered fees.

UPDATE44m ago

Conception ships voice input and new AI models

Conception has announced a new product update that introduces several key features, including voice input with real-time transcription, a refreshed lineup of AI models, and improved AI guardrails. The update also includes general performance improvements and bug fixes, all aimed at delivering a faster and more reliable experience for users.