OPEN_SOURCE ↗
REDDIT · REDDIT// 26d agoMODEL RELEASE
Prisma garage model beats GPT-2 on 30B tokens
Solo developer Yuri Ivatchkovitch releases Prisma, a 357M-parameter LM with a mirrored transformer architecture that outperforms GPT-2 Medium on 5/8 benchmarks using only 30B training tokens. Key innovations include G²LU nested gating, shared-weight mirrored layers, and Word-position Rotary Position Embedding (WoRPE).
// ANALYSIS
A one-person garage model with genuine architectural novelty outperforming established baselines on a fraction of compute — if G²LU and WoRPE scale, this could quietly influence how mainstream architectures handle feature learning.
- –G²LU (Gated-Gated Linear Unit) replaces SwiGLU with nested gates, enabling 100x higher learning rates and creating saddle-surface decision boundaries that resist memorization
- –Mirrored weight sharing achieves 2N virtual layers from N unique parameter sets — a parameter-efficient trick preserving representational depth without added cost
- –WoRPE encodes word-boundary position geometrically in attention heads, surfacing information already latent in BPE tokenization without a new tokenizer
- –Beats GPT-2 Medium (40B tokens) on 5/8 benchmarks with only 30B tokens, with particular strength on reasoning tasks (ARC-Easy +11pp, BoolQ 0.620)
- –Key caveats: untested beyond ~350M scale, single-developer maintenance, and minimal community traction (2 Reddit upvotes, 230 HF downloads/month)
// TAGS
prismallmopen-weightsopen-sourceresearchtransformer
DISCOVERED
26d ago
2026-03-16
PUBLISHED
31d ago
2026-03-11
RELEVANCE
6/ 10
AUTHOR
y3i12