Prisma garage model beats GPT-2 on 30B tokens

// 119d agoMODEL RELEASE

Prisma garage model beats GPT-2 on 30B tokens

Solo developer Yuri Ivatchkovitch releases Prisma, a 357M-parameter LM with a mirrored transformer architecture that outperforms GPT-2 Medium on 5/8 benchmarks using only 30B training tokens. Key innovations include G²LU nested gating, shared-weight mirrored layers, and Word-position Rotary Position Embedding (WoRPE).

// ANALYSIS

A one-person garage model with genuine architectural novelty outperforming established baselines on a fraction of compute — if G²LU and WoRPE scale, this could quietly influence how mainstream architectures handle feature learning.

–G²LU (Gated-Gated Linear Unit) replaces SwiGLU with nested gates, enabling 100x higher learning rates and creating saddle-surface decision boundaries that resist memorization
–Mirrored weight sharing achieves 2N virtual layers from N unique parameter sets — a parameter-efficient trick preserving representational depth without added cost
–WoRPE encodes word-boundary position geometrically in attention heads, surfacing information already latent in BPE tokenization without a new tokenizer
–Beats GPT-2 Medium (40B tokens) on 5/8 benchmarks with only 30B tokens, with particular strength on reasoning tasks (ARC-Easy +11pp, BoolQ 0.620)
–Key caveats: untested beyond ~350M scale, single-developer maintenance, and minimal community traction (2 Reddit upvotes, 230 HF downloads/month)

// TAGS

prismallmopen-weightsopen-sourceresearchtransformer

DISCOVERED

119d ago

2026-03-16

PUBLISHED

124d ago

2026-03-11

RELEVANCE

6/ 10

AUTHOR

y3i12

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE3h ago

scroll-world launches scroll-driven 3D flight skill

scroll-world is an open-source, framework-agnostic agent skill that leverages Higgsfield to generate immersive, scroll-driven 3D camera flights through diorama scenes for landing pages. By rendering seamless connection clips between neighboring frames, it allows developers to build interactive 3D narrative websites navigated simply by scrolling, without requiring heavy game engines.

MODEL4h ago

OpenAI GPT-5.6 hits Amazon Bedrock

OpenAI's GPT-5.6 model family—including Sol, Terra, and Luna—is now generally available on Amazon Bedrock. Running on Bedrock's next-generation inference engine, the models support prompt caching with a 90% discount and match OpenAI's first-party pricing.

UPDATE5h ago

OpenRouter splits rankings by model weight

OpenRouter has updated its rankings platform by introducing separate leaderboards for open-weight and closed-weight models. This allows developers to track and compare usage statistics of proprietary, API-exclusive models against downloadable open-weight models.