BACK_TO_FEEDAICRIER_2
Voicet debuts ultrafast local speech-to-text
OPEN_SOURCE ↗
REDDIT · REDDIT// 19d agoOPENSOURCE RELEASE

Voicet debuts ultrafast local speech-to-text

Voicet is a Rust-based, local realtime speech-to-text app built on Mistral's Voxtral Mini 4B Realtime model. It targets RTX 3000-series-or-better NVIDIA GPUs with 11GB+ VRAM, lets users dictate into any app without cloud calls or API keys, and is still looking for testers.

// ANALYSIS

This is a serious latency demo first and a polished consumer app second, which is fine: local speech-to-text only becomes compelling when it feels instant.

  • The engineering story is the real hook: Rust, Candle, CUDA, and a vendored flash-attn fork keep the runtime much smaller than the Python/Transformers stack.
  • The workflow looks genuinely useful for power users: live transcription, type-into-any-app mode, hotkey pause/resume, automatic paragraph breaks, and offline WAV transcription.
  • The author's performance claims are compelling but self-reported: the repo cites 5x realtime on RTX 5080 and much faster startup than Python, while the Reddit post says it beats Mistral's demo and Speechmatics.
  • It is CUDA-only and hardware-gated today: RTX 3000+ with 11GB+ VRAM, Windows/Linux support, and DGX Spark still untested.
  • Using the full BF16 model helps explain the speed/quality story, but it also explains the GPU requirement and power draw.
// TAGS
voicetspeechgpuinferenceopen-sourceself-hosted

DISCOVERED

19d ago

2026-03-23

PUBLISHED

19d ago

2026-03-23

RELEVANCE

8/ 10

AUTHOR

okashiraa