BACK_TO_FEEDAICRIER_2
Prisma garage model beats GPT-2 on 30B tokens
OPEN_SOURCE ↗
REDDIT · REDDIT// 26d agoMODEL RELEASE

Prisma garage model beats GPT-2 on 30B tokens

Solo developer Yuri Ivatchkovitch releases Prisma, a 357M-parameter LM with a mirrored transformer architecture that outperforms GPT-2 Medium on 5/8 benchmarks using only 30B training tokens. Key innovations include G²LU nested gating, shared-weight mirrored layers, and Word-position Rotary Position Embedding (WoRPE).

// ANALYSIS

A one-person garage model with genuine architectural novelty outperforming established baselines on a fraction of compute — if G²LU and WoRPE scale, this could quietly influence how mainstream architectures handle feature learning.

  • G²LU (Gated-Gated Linear Unit) replaces SwiGLU with nested gates, enabling 100x higher learning rates and creating saddle-surface decision boundaries that resist memorization
  • Mirrored weight sharing achieves 2N virtual layers from N unique parameter sets — a parameter-efficient trick preserving representational depth without added cost
  • WoRPE encodes word-boundary position geometrically in attention heads, surfacing information already latent in BPE tokenization without a new tokenizer
  • Beats GPT-2 Medium (40B tokens) on 5/8 benchmarks with only 30B tokens, with particular strength on reasoning tasks (ARC-Easy +11pp, BoolQ 0.620)
  • Key caveats: untested beyond ~350M scale, single-developer maintenance, and minimal community traction (2 Reddit upvotes, 230 HF downloads/month)
// TAGS
prismallmopen-weightsopen-sourceresearchtransformer

DISCOVERED

26d ago

2026-03-16

PUBLISHED

31d ago

2026-03-11

RELEVANCE

6/ 10

AUTHOR

y3i12