YT · YOUTUBE// 27d agoRESEARCH PAPER

SAGE cuts agent tokens 59% via self-built skill library

SAGE (Skill Augmented GRPO for self-Evolution) is an RL framework from AWS Agentic AI and UW-Madison that trains LLM agents to accumulate reusable skills across tasks. On the AppWorld benchmark, it cuts interaction steps by 26% and token generation by 59% versus baseline GRPO, while outperforming prompted GPT-4o and Claude Sonnet by 3x on multi-step scenario completion.

// ANALYSIS

The 59% token reduction is the real headline here — if RL-trained skill reuse generalizes beyond AppWorld, this has direct infrastructure cost implications for anyone running agentic pipelines at scale.

–SAGE extends GRPO with "Sequential Rollout": skills generated for Task 1 are available in Task 2, and Task 2's outcome reward flows back to reinforce good skill generation — true cross-task credit assignment
–The skill library uses four operations (generate, use, update, save), letting agents patch failing skills mid-rollout rather than starting from scratch
–Beating ReAct + GPT-4o / o1 / Claude Sonnet by 3x on scenario completion with a fine-tuned open model is a strong result for the "train smaller, smarter" camp
–SFT initialization used Claude 3.5 Sonnet as oracle to bootstrap quality trajectories — underscoring how frontier models are increasingly being used to distill capabilities into cheaper models
–GitHub repo is public at amazon-science/SAGE; reproducibility is high given the AppWorld benchmark is open

// TAGS

sageagentllmrlfine-tuningopen-sourceresearch

DISCOVERED

27d ago

2026-03-15

PUBLISHED

27d ago

2026-03-15

RELEVANCE

8/ 10

AUTHOR

Discover AI