Microsoft drops Harrier-OSS-v1 multilingual embeddings
Microsoft released Harrier-OSS-v1, a family of decoder-only multilingual text embedding models supporting 94 languages and 32k context. The models achieve SOTA results on Multilingual MTEB v2 through architectural shifts and knowledge distillation.
Microsoft is aggressively moving into the open-weights embedding space, proving that decoder-only backbones are the future for retrieval.
- –32k token context window eliminates complex chunking strategies and "lost in the middle" issues for long-form RAG.
- –Smaller variants (270M, 0.6B) punch far above their weight due to distillation from 27B+ parameter backbones.
- –94-language support and instruction-tuning make it a versatile foundation for global retrieval and classification tasks.
DISCOVERED
56d ago
2026-04-02
PUBLISHED
56d ago
2026-04-02
RELEVANCE
AUTHOR
AI Revolution