Shifted Key GDN trims params, keeps pace

// 99d agoBENCHMARK RESULT

Shifted Key GDN trims params, keeps pace

A Reddit experiment reports that Gated Delta Net can drop learned Q/K projections and instead use the current hidden state as query and the previous hidden state as key. On undertrained coding runs, the shifted-key variant slightly improves loss while using about 12.5% to 25% fewer layer parameters.

// ANALYSIS

The interesting part is not just the parameter cut; it is that the inductive bias seems to fit linear-attention-style memory better than softmax attention. In other words, GDN may be getting more mileage from state geometry and token-to-token chaining than from learned query/key transforms.

–The reported gain is small but meaningful: better or equal loss with fewer parameters and faster convergence suggests Q/K projections were partly redundant at this scale.
–The effect not transferring to softmax attention is the key clue; exact key lookup and normalized retrieval likely depend more on explicit query/key separation there.
–The shifted-key setup bakes in a strong local continuity prior, which may help linear memory blocks because the hidden state already carries enough context to serve as both selector and address.
–The results are still narrow: one dataset, undertrained runs, and a specific architecture family, so this is a useful architectural signal rather than proof of a general rule.
–If it holds up at larger scales, it argues for simpler linear-attention blocks that spend capacity on memory dynamics instead of projection layers.

// TAGS

shifted-key-gated-delta-netgated-delta-netllmbenchmarkresearchopen-source

DISCOVERED

99d ago

2026-04-04

PUBLISHED

99d ago

2026-04-04

RELEVANCE

8/ 10

AUTHOR

jfguan

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

VIDEO26m ago

Higgsfield drops developer CLI and MCP server

Higgsfield has launched a developer CLI and MCP server, allowing programmers and autonomous agents to programmatically trigger, customize, and edit marketing ads and cinematic videos directly through terminal commands. Demonstrated by developer Cole Medin using Anthropic's Claude Code and the Archon workflow engine, the toolkit enables fully automated video production pipelines.

OPEN SOURCE26m ago

AI Content Factory automates video ads

AI Content Factory is an open-source workflow that automates bulk marketing video generation from a product catalog. Built on the Archon agentic engine and Higgsfield CLI, it reduces costs by gating expensive video rendering behind cheap image exploration and human approval.

NEWS2h ago

George Hotz shares his enthusiasm for LLMs and open-source coding agents while criticizing doom-mongering and the overinflated valuations of frontier AI labs.

George Hotz (geohot) details his excitement for the practical applications of AI—such as LLMs, self-driving cars, video generation models, and AI coding agents—highlighting his successful setup of the open-source agent OpenCode on a local GLM-5.2 model. However, he strongly criticizes the prevailing industry hype, safety-related doom-mongering, and the multibillion-dollar valuations of frontier AI labs. Hotz argues that frontier labs will fail to capture most of the AI value because AI is a commodity driven by Moore's law and general computing progress. He also frames coding models not as autonomous creators, but as valuable productivity tools analogous to compilers, find-and-replace, or Stack Overflow that are changing the nature of programming.