OPEN_SOURCE ↗
REDDIT · REDDIT// 13d agoBENCHMARK RESULT
Deepgram beats ElevenLabs, AssemblyAI in Swedish diarization
On a real 2h22m Swedish meeting with six speakers, Deepgram delivered the best diarization balance: 92%+ accuracy, full speaker coverage, and much faster turnaround than AssemblyAI. ElevenLabs' Swedish transcription sounded cleaner, but its diarization missed two speakers outright.
// ANALYSIS
This is the kind of benchmark that matters because it tests a real long-form meeting, not a synthetic clip. For multilingual voice apps, the winning stack is often the one that separates transcription quality from speaker separation instead of trying to make one vendor do both.
- –Deepgram was the only option here to keep all 6 speakers and stay around 92% diarization accuracy while finishing in under a minute.
- –ElevenLabs' Swedish text quality looks better in practice, but 32.8% time accuracy and 4/6 speakers make its diarizer a no-go for serious meeting apps.
- –AssemblyAI is close on raw accuracy, but 218-303 second runtimes are hard to justify when latency matters.
- –PyannoteAI Precision-2 may look stronger on paper, but async, job-based execution pushes it out of the usable-now bucket for real-time or near-real-time pipelines.
- –The practical play is a hybrid pipeline: use one model for Swedish transcription, another for diarization, then align the outputs downstream.
// TAGS
speechbenchmarkapideepgramelevenlabsassemblyai
DISCOVERED
13d ago
2026-03-29
PUBLISHED
13d ago
2026-03-29
RELEVANCE
8/ 10
AUTHOR
invismanfow