Weekly multimodal AI roundup: Phi-4, Helios, LTX-2.3

// 75d agoMODEL RELEASE

Weekly multimodal AI roundup: Phi-4, Helios, LTX-2.3

A weekly open-source multimodal AI digest surfaces several notable local model releases, led by Microsoft's MIT-licensed Phi-4-reasoning-vision-15B with strong math and UI reasoning. Also featured: Lightricks' LTX-2.3 video model with portrait mode support, and Helios, a 14B video model claiming real-time inference on a single GPU.

// ANALYSIS

The pace of capable open-weight multimodal releases is accelerating fast enough that a weekly roundup is now genuinely hard to keep up with — and that's a bullish signal for the local AI ecosystem.

–Microsoft's Phi-4-reasoning-vision-15B is the headline: MIT-licensed, 15B parameters, targeting math, science, and UI reasoning — a strong open-weight alternative to proprietary vision models
–Helios (PKU-YuanGroup) claims 14B video generation running real-time on one GPU with t2v/i2v/v2v up to a minute; the author flags the numbers as suspiciously good, worth independent verification
–LTX-2.3 from Lightricks shows healthy community momentum — GGUF workflows, a desktop app, and a Linux port emerged within days of release
–NEO-unify skipping traditional encoders entirely is an architectural bet worth watching; growing evidence that CLIP/SigLIP encoders may not be essential for multimodal models
–Tencent's HY-WU delivering face swaps and style transfer without any fine-tuning is a practical win for deployment scenarios where per-user training is infeasible

// TAGS

multimodalopen-weightsvideo-genllmreasoningimage-genopen-source

DISCOVERED

75d ago

2026-03-14

PUBLISHED

77d ago

2026-03-11

RELEVANCE

8/ 10

AUTHOR

Vast_Yak_4147

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE19m ago

Agent-HTML swaps Markdown for interactive artifacts

Agent-HTML introduces a semantic HTML architecture designed for AI agents to generate stable, interactive "experience objects" instead of long-form Markdown. It bridges the gap between raw LLM output and high-fidelity, shareable engineering documents.

OPEN SOURCE19m ago

OpenBMB launches PilotDeck "agent OS" for WorkSpaces

PilotDeck is an open-source productivity platform that organizes AI agents into isolated "WorkSpaces" with dedicated file systems and memory. Developed by OpenBMB and Tsinghua University, it focuses on production-grade reliability and cost efficiency for complex, multi-project workflows.

OPEN SOURCE19m ago

make-pages-interactive adds live HTML commenting

A Claude Code skill that turns static HTML into an interactive surface for live feedback. Claude monitors a local inbox to automatically implement requested changes directly in the code.