Parakeet TDT v3 adds 25 languages, 3000x speed
NVIDIA's latest Token-and-Duration Transducer (TDT) models achieve 3000x real-time throughput, with v3 expanding support to 25 languages while v2 remains the precision choice for English-only tasks.
Parakeet TDT is the speed king of ASR, effectively rendering Whisper-based pipelines obsolete for high-volume tasks, but it prioritizes "clean" readability over verbatim audio records. Version 2 remains the precision choice for English-only tasks with a 6.05% WER compared to v3's 6.32% WER. v3's primary value lies in its 25-language multilingual engine and improved robustness against non-speech audio hallucinations. Both models aggressively filter "ums" and "uhs," making them unsuitable for legal or clinical verbatim requirements. Achieving 3000x RTFx means one hour of audio is processed in ~1 second, enabling massive-scale transcription at negligible cost.
DISCOVERED
8d ago
2026-04-04
PUBLISHED
8d ago
2026-04-03
RELEVANCE
AUTHOR
walleynguyen