OPEN_SOURCE ↗
YT · YOUTUBE// 27d agoRESEARCH PAPER
SAGE cuts agent tokens 59% via self-built skill library
SAGE (Skill Augmented GRPO for self-Evolution) is an RL framework from AWS Agentic AI and UW-Madison that trains LLM agents to accumulate reusable skills across tasks. On the AppWorld benchmark, it cuts interaction steps by 26% and token generation by 59% versus baseline GRPO, while outperforming prompted GPT-4o and Claude Sonnet by 3x on multi-step scenario completion.
// ANALYSIS
The 59% token reduction is the real headline here — if RL-trained skill reuse generalizes beyond AppWorld, this has direct infrastructure cost implications for anyone running agentic pipelines at scale.
- –SAGE extends GRPO with "Sequential Rollout": skills generated for Task 1 are available in Task 2, and Task 2's outcome reward flows back to reinforce good skill generation — true cross-task credit assignment
- –The skill library uses four operations (generate, use, update, save), letting agents patch failing skills mid-rollout rather than starting from scratch
- –Beating ReAct + GPT-4o / o1 / Claude Sonnet by 3x on scenario completion with a fine-tuned open model is a strong result for the "train smaller, smarter" camp
- –SFT initialization used Claude 3.5 Sonnet as oracle to bootstrap quality trajectories — underscoring how frontier models are increasingly being used to distill capabilities into cheaper models
- –GitHub repo is public at amazon-science/SAGE; reproducibility is high given the AppWorld benchmark is open
// TAGS
sageagentllmrlfine-tuningopen-sourceresearch
DISCOVERED
27d ago
2026-03-15
PUBLISHED
27d ago
2026-03-15
RELEVANCE
8/ 10
AUTHOR
Discover AI