BACK_TO_FEEDAICRIER_2
Grow, Don’t Overwrite curbs catastrophic forgetting
OPEN_SOURCE ↗
YT · YOUTUBE// 31d agoRESEARCH PAPER

Grow, Don’t Overwrite curbs catastrophic forgetting

This paper proposes a function-preserving way to expand transformer MLP layers during fine-tuning by duplicating up-projection weights and compensating in the down-projection layer. On Gemma models, it reports downstream performance comparable to standard fine-tuning while preserving much more of the base model’s original capabilities.

// ANALYSIS

This is a sharp continual-learning result because it solves forgetting by adding reusable capacity instead of treating retention as a regularization tax.

  • The core method is elegant: copy the MLP up-projection, scale the down-projection, and keep the expanded network mathematically identical to the original at initialization so training stays stable.
  • The paper shows the clearest gains on high-shift tasks like translation and entailment, where standard fine-tuning erases prior capabilities but the growth-based variants preserve them.
  • It is more practical than it first sounds: growing all layers trains roughly 60% of the original parameter count, and growing only 9-10 targeted layers gets close to full performance at roughly 30%.
  • The main limitation is scope: the experiments are centered on MLP growth in transformer models, and harder reasoning tasks like MathQA still benefit from a less frozen variant.
// TAGS
grow-dont-overwritefine-tuningllmresearch

DISCOVERED

31d ago

2026-03-11

PUBLISHED

31d ago

2026-03-11

RELEVANCE

9/ 10

AUTHOR

Discover AI