YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Shifted Key GDN trims params, keeps pace

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Shifted Key GDN trims params, keeps pace
OPEN LINK ↗
// 54d agoBENCHMARK RESULT

Shifted Key GDN trims params, keeps pace

A Reddit experiment reports that Gated Delta Net can drop learned Q/K projections and instead use the current hidden state as query and the previous hidden state as key. On undertrained coding runs, the shifted-key variant slightly improves loss while using about 12.5% to 25% fewer layer parameters.

// ANALYSIS

The interesting part is not just the parameter cut; it is that the inductive bias seems to fit linear-attention-style memory better than softmax attention. In other words, GDN may be getting more mileage from state geometry and token-to-token chaining than from learned query/key transforms.

  • The reported gain is small but meaningful: better or equal loss with fewer parameters and faster convergence suggests Q/K projections were partly redundant at this scale.
  • The effect not transferring to softmax attention is the key clue; exact key lookup and normalized retrieval likely depend more on explicit query/key separation there.
  • The shifted-key setup bakes in a strong local continuity prior, which may help linear memory blocks because the hidden state already carries enough context to serve as both selector and address.
  • The results are still narrow: one dataset, undertrained runs, and a specific architecture family, so this is a useful architectural signal rather than proof of a general rule.
  • If it holds up at larger scales, it argues for simpler linear-attention blocks that spend capacity on memory dynamics instead of projection layers.
// TAGS
shifted-key-gated-delta-netgated-delta-netllmbenchmarkresearchopen-source

DISCOVERED

54d ago

2026-04-04

PUBLISHED

54d ago

2026-04-04

RELEVANCE

8/ 10

AUTHOR

jfguan