Modular MAX lands Gemma 4, beats vLLM

// 55d agoINFRASTRUCTURE

Modular MAX lands Gemma 4, beats vLLM

Modular says it had Gemma 4 running on its MAX inference stack on launch day across NVIDIA B200 and AMD MI355X, using the same serving layer for both vendors. On B200, it reports 15% higher output throughput than vLLM, while Gemma 4 itself brings 256K context, native multimodality, and open Apache 2.0 weights.

// ANALYSIS

The interesting part here is less the model release than the infrastructure story: Modular is positioning MAX as the portable serving layer for heterogeneous datacenter fleets, not a one-off benchmark harness.

–Day-zero support for both Blackwell and AMD hardware is the real differentiator for teams that do not want separate stacks per vendor
–The 15% vLLM win is credible marketing only if the methodology is clear; decode mix, batching, quantization, and context length can move throughput materially
–Gemma 4’s 256K context and multimodal inputs raise serving complexity, so a unified inference stack matters more than raw model compatibility
–Apache 2.0 licensing makes Gemma 4 easier to adopt in private and commercial deployments, which helps infrastructure vendors like Modular sell the portability story
–This reads as a platform proof point for MAX: open models, OpenAI-compatible serving, and GPU-agnostic deployment in one stack

// TAGS

maxgemma-4inferencegpumultimodalbenchmarkopen-source

DISCOVERED

55d ago

2026-04-02

PUBLISHED

55d ago

2026-04-02

RELEVANCE

9/ 10

AUTHOR

carolinedfrasca

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL24m ago

Prism ML launches Bonsai Image 4B variants

Prism ML has released Bonsai Image 4B, a compact text-to-image diffusion model family built from FLUX.2 Klein 4B for local inference on Apple Silicon and NVIDIA GPUs. The launch includes 1-bit and ternary variants, plus Bonsai Studio for trying the model on iPhone.

OPEN SOURCE30m ago

OpenMobius-skill packages ICT, SMC for agents

OpenMobius-skill turns ICT and smart money concepts into a reusable skill for Claude Code, Codex, OpenClaw, and Hermes, backed by 964 knowledge cards, live market data, and chart generation. Its 0.2.0 update on 2026-05-23 made the SMC structural indicator the default analysis path and added automatic overlays plus freshness disclosure.

OPEN SOURCE30m ago

Hallmark fights AI template sameness

Hallmark is an open-source design skill for Claude Code, Cursor, and Codex that pushes generated UIs away from samey, default-looking layouts. It varies macrostructure, theme, and layout, then runs style gates before handing work back.