BACK_TO_FEEDAICRIER_2
daVinci-MagiHuman drops open-source human video model
OPEN_SOURCE ↗
YT · YOUTUBE// 14d agoMODEL RELEASE

daVinci-MagiHuman drops open-source human video model

daVinci-MagiHuman is a 15B open-source audio-video foundation model for human-centric generation. It uses a single-stream Transformer to sync speech, facial performance, and body motion, and ships the full stack with multilingual support plus fast-inference tooling.

// ANALYSIS

This is the kind of release that makes human-video generation feel less like a demo and more like a system other teams can actually build on. The architecture is the real story: simplifying multimodal fusion may matter more than any single benchmark number.

  • Single-stream self-attention removes cross-attention plumbing, which should make training and debugging simpler.
  • The speed claims are real but hardware-bound: 5 seconds of 256p in 2 seconds is strong, but 1080p still needs a much heavier second stage.
  • Pairwise wins over Ovi 1.1 and LTX 2.3 suggest competitive quality, though the evaluation is still early and likely curated.
  • Releasing the base, distilled, SR, and inference stack makes this much more useful to researchers than a paper-only announcement.
// TAGS
multimodalvideo-genaudio-genspeechopen-sourcedavinci-magihuman

DISCOVERED

14d ago

2026-03-28

PUBLISHED

14d ago

2026-03-28

RELEVANCE

9/ 10

AUTHOR

Github Awesome