OPEN_SOURCE ↗
YT · YOUTUBE// 14d agoMODEL RELEASE
daVinci-MagiHuman drops open-source human video model
daVinci-MagiHuman is a 15B open-source audio-video foundation model for human-centric generation. It uses a single-stream Transformer to sync speech, facial performance, and body motion, and ships the full stack with multilingual support plus fast-inference tooling.
// ANALYSIS
This is the kind of release that makes human-video generation feel less like a demo and more like a system other teams can actually build on. The architecture is the real story: simplifying multimodal fusion may matter more than any single benchmark number.
- –Single-stream self-attention removes cross-attention plumbing, which should make training and debugging simpler.
- –The speed claims are real but hardware-bound: 5 seconds of 256p in 2 seconds is strong, but 1080p still needs a much heavier second stage.
- –Pairwise wins over Ovi 1.1 and LTX 2.3 suggest competitive quality, though the evaluation is still early and likely curated.
- –Releasing the base, distilled, SR, and inference stack makes this much more useful to researchers than a paper-only announcement.
// TAGS
multimodalvideo-genaudio-genspeechopen-sourcedavinci-magihuman
DISCOVERED
14d ago
2026-03-28
PUBLISHED
14d ago
2026-03-28
RELEVANCE
9/ 10
AUTHOR
Github Awesome