Microsoft has launched MAI-Thinking-1, a 35-billion active parameter Mixture of Experts reasoning model custom-trained from scratch.
Microsoft AI has introduced MAI-Thinking-1, a sparse Mixture of Experts model featuring 35 billion active parameters and a 128K context window. Trained entirely from scratch on commercially licensed, enterprise-grade data without distillation from third-party models, the model is designed to handle complex multi-step reasoning, mathematics, and software engineering. It matches Claude 3.5 Sonnet on the SWE-bench Pro coding benchmark and is available via Microsoft Foundry and Baseten.
Microsoft's move to train a massive MoE reasoning model completely from scratch indicates a strategic push to reduce reliance on OpenAI and offer enterprise clients high-steerability models with clean data provenance.
* Developing a 35B active parameter MoE model without model distillation is incredibly resource-intensive but yields highly customizable weights free of third-party licensing constraints.
* The model's success on SWE-bench Pro validates Microsoft's system-level "Hill-Climbing Machine" pipeline for iterative performance optimization.
* Offering the model through Baseten enables a hybrid deployment model that appeals to teams seeking the control of self-hosting with the robustness of managed infrastructure.
DISCOVERED
1h ago
2026-06-03
PUBLISHED
1h ago
2026-06-03
RELEVANCE
AUTHOR
WorldofAI
