BACK_TO_FEEDAICRIER_2
Depth Pruning Transfers Across GPT-2, TinyLlama
OPEN_SOURCE ↗
REDDIT · REDDIT// 12d agoBENCHMARK RESULT

Depth Pruning Transfers Across GPT-2, TinyLlama

This Reddit post reports that removing the most sensitive transformer layers, then recovering with distillation, shrinks GPT-2 and TinyLlama with modest perplexity loss. The surprising part is that the same pruning recipe appears to transfer cleanly from a 124M GPT-2 setup to TinyLlama 1.1B.

// ANALYSIS

This looks less like a new compression miracle than a strong sign that depth is more redundant than width in these models, especially in the middle layers. The interesting bit is the transferability: the same simple pruning heuristic seems to survive a jump across architectures and scale.

  • GPT-2 saw roughly 11-17% parameter reduction with about 9-13% PPL degradation and a reported 1.2x decode speedup
  • TinyLlama 1.1B held up reasonably well at 22L -> 20L and 19L, with 20L giving the cleaner speed win and only a small PPL ratio bump
  • The stability across three seeds suggests the recovery stage is doing real work, not just riding noise
  • Early and late layers still look disproportionately important, which matches a lot of pruning intuition and makes the “best” layer pair a moving target after recovery
  • This is promising for model compression workflows, but it is still a small-scale result and needs broader benchmark coverage before anyone treats it as a general rule
// TAGS
llmbenchmarkresearchdepth-first-pruninggpt-2tinyllama

DISCOVERED

12d ago

2026-03-31

PUBLISHED

12d ago

2026-03-31

RELEVANCE

8/ 10

AUTHOR

califalcon