Prisma drops mirrored transformer, lean-data gains

// 94d agoMODEL RELEASE

Prisma drops mirrored transformer, lean-data gains

Prisma is a 357M-parameter experimental language model from solo developer Yuri Ivatchkovitch that combines a mirrored transformer layout with nested G²LU gating and WoRPE positional encoding. Released on Hugging Face after training on roughly 30B tokens on a single H100, it claims stronger results than GPT-2 Medium on 5 of 8 reported benchmarks while using less training data.

// ANALYSIS

Prisma is more interesting as an architecture experiment than as a must-use model release, and that is exactly why AI developers should pay attention. It is a credible garage-lab attempt to test alternatives to the standard GPT/Llama recipe in public, with enough benchmark signal to justify a closer look.

–The core novelty is structural: mirrored layers share weights across expand/compress phases, while the extra G²LU gate tries to recover expressiveness without blowing up parameter count
–The reported wins over GPT-2 Medium matter mostly as proof that the idea is not nonsense, not as evidence Prisma has caught up to the best modern small models
–The author is unusually explicit about caveats, including reliance on MobileLLM tokenizer embeddings and the possibility that the architecture needs pre-trained embedding geometry to work at all
–For model builders, the real appeal is reproducibility and cost profile: a public Hugging Face release, detailed model card, and a single-H100 training story make this a useful reference point for small-scale experimentation

// TAGS

prismallmopen-weightsresearchbenchmark

DISCOVERED

94d ago

2026-03-08

PUBLISHED

94d ago

2026-03-08

RELEVANCE

7/ 10

AUTHOR

y3i12

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS18m ago

Claude Fable 5 tops 5.5 in data analysis

In a recent post on X, user Theo expressed intense enthusiasm about the data analysis capabilities of an AI model called Fable. By stating it is "WAY better than 5.5," the user implies a significant generational leap in performance over what is likely a major foundational model, suggesting Fable is exceptionally well-suited for complex data tasks.

MODEL50m ago

Claude Fable 5 launch sparks massive developer backlash

Anthropic's Claude Fable 5 launch faces severe developer backlash over aggressive safety restrictions, high pricing, and a forced 30-day data retention policy. The model silently routes chemistry, biology, and cybersecurity requests to the older Opus 4.8 model, frustrating users with opaque downgrades and anti-distillation blocks.

MODEL50m ago

Designers praise Claude Fable 5 landing pages

Educator and designer Meng To highlighted Claude Fable 5's capability for creating landing pages on X, calling the model "a monster" for the task. Released in June 2026, Claude Fable 5 is Anthropic's latest Mythos-class AI model, featuring a 1-million-token context window, a 128,000-token output capacity, and advanced reasoning for long-horizon agentic workflows, making it highly effective for complex design and front-end code generation tasks.