OPEN_SOURCE ↗
REDDIT · REDDIT// 26d agoBENCHMARK RESULT
Thai MTEB reveals Qwen3-4B leads, tiny 0.6B surprises
A community-run Thai MTEB benchmark tests 14 embedding models across 15 Thai tasks on Thailand's LANTA national supercomputer, with results merged into the official MTEB repo. Qwen3-Embedding-4B tops the chart at 74.41, while Qwen3-0.6B impresses by nearly matching much larger models at a fraction of the size.
// ANALYSIS
The Thai language is chronically under-benchmarked, and this leaderboard filling that gap with national supercomputing resources and official MTEB integration is a meaningful contribution — but the real story is how decisively Qwen3 dominates.
- –Qwen3-Embedding-4B outperforms KaLM-Gemma3-12B (Tencent's multilingual SOTA) by 0.5 points despite being 3× smaller — a strong efficiency win
- –Qwen3-Embedding-0.6B scores 69.08, only 5 points behind the 4B model, making it an attractive choice for resource-constrained Thai deployments
- –BOOM_4B_v1 at 71.84 is the highest-ranking Thai-specific model, showing local fine-tuning still competes with general multilingual giants
- –bge-m3, long a go-to for multilingual retrieval, scores a modest 64.77 — suggesting it wasn't trained with enough Thai data to stay competitive
- –Jina-v3 lags significantly at 57.81 despite being newer, hinting at Thai coverage gaps in its training
// TAGS
embeddingbenchmarkllmmultilingualopen-sourceqwen3-embeddingthai-mteb-leaderboard
DISCOVERED
26d ago
2026-03-16
PUBLISHED
27d ago
2026-03-16
RELEVANCE
7/ 10
AUTHOR
anusoft