Grow, Don’t Overwrite curbs catastrophic forgetting

// 124d agoRESEARCH PAPER

Grow, Don’t Overwrite curbs catastrophic forgetting

This paper proposes a function-preserving way to expand transformer MLP layers during fine-tuning by duplicating up-projection weights and compensating in the down-projection layer. On Gemma models, it reports downstream performance comparable to standard fine-tuning while preserving much more of the base model’s original capabilities.

// ANALYSIS

This is a sharp continual-learning result because it solves forgetting by adding reusable capacity instead of treating retention as a regularization tax.

–The core method is elegant: copy the MLP up-projection, scale the down-projection, and keep the expanded network mathematically identical to the original at initialization so training stays stable.
–The paper shows the clearest gains on high-shift tasks like translation and entailment, where standard fine-tuning erases prior capabilities but the growth-based variants preserve them.
–It is more practical than it first sounds: growing all layers trains roughly 60% of the original parameter count, and growing only 9-10 targeted layers gets close to full performance at roughly 30%.
–The main limitation is scope: the experiments are centered on MLP growth in transformer models, and harder reasoning tasks like MathQA still benefit from a less frozen variant.

// TAGS

grow-dont-overwritefine-tuningllmresearch

DISCOVERED

124d ago

2026-03-11

PUBLISHED

124d ago

2026-03-11

RELEVANCE

9/ 10

AUTHOR

Discover AI

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS17m ago

swyx outlines specialized multi-model AI workflow

In a recent tweet, swyx shared his multi-model AI stack for complex projects, assigning specialized tasks to models like sol ultra for planning, fable 5 for critiquing, and sonnet 5 for code generation. He also highlighted the importance of interactive, interview-style prompting to clarify design decisions.

NEWS19m ago

Tweet mocks Claude Fable 5 safety filters

Indie developer Pieter Levels (@levelsio) shared a post mocking the overly sensitive safety guardrails of Anthropic's Claude Fable 5 AI model. The message satirizes Fable's warning system by claiming a 'life simulation' was downgraded to Opus 4.5 without appeal, highlighting developer frustration with aggressive safety routing.

LAUNCH45m ago

Brockman highlights ChatGPT Work mobile experience

OpenAI President and Co-founder Greg Brockman shared his enthusiasm for ChatGPT Work, noting that while the new agent-based platform has received less attention than other recent updates, it offers a highly functional and impressive mobile experience. Powered by the GPT-5.6 model family, ChatGPT Work transitions ChatGPT from a conversational chatbot into an autonomous agent capable of executing complex, multi-step workflows and cross-app integrations directly from mobile and desktop interfaces.