vLLM adds Cohere-Transcribe for efficient ASR
vLLM has integrated Cohere's new cohere-transcribe-03-2026 model, providing native support for high-throughput speech-to-text. By leveraging variable-length encoder inputs, the integration eliminates traditional padding overhead to maximize inference efficiency.
Cohere's move into ASR via vLLM directly challenges Whisper's dominance because the integration is built around variable-length encoder inputs instead of fixed-padding models. Adding it to the v1/audio/transcriptions API gives developers a unified stack for serving both LLMs and state-of-the-art ASR from a single engine, and native CohereAsrForConditionalGeneration support makes it a credible open-weights alternative to proprietary transcription APIs. The standardized English text normalizers in vLLM's test suite help make the integration feel production-ready for enterprise deployments.
DISCOVERED
17d ago
2026-03-26
PUBLISHED
17d ago
2026-03-26
RELEVANCE
AUTHOR
LinkSea8324