YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Prisma garage model beats GPT-2 on 30B tokens

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Prisma garage model beats GPT-2 on 30B tokens
OPEN LINK ↗
// 74d agoMODEL RELEASE

Prisma garage model beats GPT-2 on 30B tokens

Solo developer Yuri Ivatchkovitch releases Prisma, a 357M-parameter LM with a mirrored transformer architecture that outperforms GPT-2 Medium on 5/8 benchmarks using only 30B training tokens. Key innovations include G²LU nested gating, shared-weight mirrored layers, and Word-position Rotary Position Embedding (WoRPE).

// ANALYSIS

A one-person garage model with genuine architectural novelty outperforming established baselines on a fraction of compute — if G²LU and WoRPE scale, this could quietly influence how mainstream architectures handle feature learning.

  • G²LU (Gated-Gated Linear Unit) replaces SwiGLU with nested gates, enabling 100x higher learning rates and creating saddle-surface decision boundaries that resist memorization
  • Mirrored weight sharing achieves 2N virtual layers from N unique parameter sets — a parameter-efficient trick preserving representational depth without added cost
  • WoRPE encodes word-boundary position geometrically in attention heads, surfacing information already latent in BPE tokenization without a new tokenizer
  • Beats GPT-2 Medium (40B tokens) on 5/8 benchmarks with only 30B tokens, with particular strength on reasoning tasks (ARC-Easy +11pp, BoolQ 0.620)
  • Key caveats: untested beyond ~350M scale, single-developer maintenance, and minimal community traction (2 Reddit upvotes, 230 HF downloads/month)
// TAGS
prismallmopen-weightsopen-sourceresearchtransformer

DISCOVERED

74d ago

2026-03-16

PUBLISHED

79d ago

2026-03-11

RELEVANCE

6/ 10

AUTHOR

y3i12