BACK_TO_FEEDAICRIER_2
Turkish Call Centers Weigh Whisper, Qwen ASR
OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoINFRASTRUCTURE

Turkish Call Centers Weigh Whisper, Qwen ASR

The post asks which STT foundation makes the most sense for a Turkish call-center pipeline: Whisper Large v3, Whisper Large v3 Turbo, or Qwen3-ASR 1.7B/0.6B. The real trade-off is not just accuracy, but whether the team wants a mature CPU-serving path or a GPU-first stack that can handle 20-30 concurrent near-real-time requests.

// ANALYSIS

The hottest take: this is more of an infrastructure decision than a model-benchmark decision. If the deployment target is real-time concurrency at 20-30 streams, GPU is the safer default; CPU-only is doable for some Whisper setups, but it is much harder to make predictable at call-center latency.

  • Whisper has the most mature self-hosted inference ecosystem for CPU and GPU, with `whisper.cpp` for CPU-only and `faster-whisper`/CTranslate2 for efficient quantized serving.
  • Qwen3-ASR looks compelling on paper for Turkish because the official model card explicitly supports Turkish, but the official repo pushes `qwen-asr` plus vLLM and FlashAttention, which points to a GPU-centered deployment path.
  • Whisper Large v3 Turbo is the obvious throughput knob inside the Whisper family, while Large v3 is the safer accuracy-first baseline if you can afford the compute.
  • Fine-tuning on 91 hours of real call-center audio is likely to matter more than small architecture differences, but only if the serving engine can preserve timestamps, batching behavior, and stability under load.
  • The biggest deployment risk is operational: streaming segmentation, memory/VRAM footprint, and tail latency under concurrency will matter more than headline WER once this is behind an STT -> LLM -> TTS pipeline.
// TAGS
whisperqwen3-asrspeechinferencegpufine-tuningopen-source

DISCOVERED

3h ago

2026-04-25

PUBLISHED

4h ago

2026-04-24

RELEVANCE

8/ 10

AUTHOR

iamtamerr