Microsoft drops MAI-Voice-1 for 60x real-time TTS
Microsoft's first-party MAI-Voice-1 text-to-speech model generates 60 seconds of high-fidelity, emotionally nuanced audio in under one second. Supporting zero-shot voice cloning and fine-grained SSML control, it is now available in public preview via Azure AI Foundry and Copilot Daily.
MAI-Voice-1 is a strategic power move aimed at achieving AI self-sufficiency by undercutting specialized competitors on speed and price. Its 60x real-time throughput is a massive differentiator for developers building real-time voice agents, while the $22 per million characters pricing offers an aggressive enterprise alternative to ElevenLabs. Integration into the broader MAI stack allows Microsoft to offer a vertically integrated, low-latency speech pipeline within the Azure ecosystem.
DISCOVERED
5d ago
2026-04-06
PUBLISHED
5d ago
2026-04-06
RELEVANCE
AUTHOR
AI Revolution