BACK_TO_FEEDAICRIER_2
RUMAD cuts debate token costs with RL
OPEN_SOURCE ↗
YT · YOUTUBE// 37d agoRESEARCH PAPER

RUMAD cuts debate token costs with RL

RUMAD is a research paper on multi-agent debate that replaces fixed or fully connected agent communication with an RL controller that dynamically rewires the debate graph. The result is a much cheaper reasoning setup: over 80% lower token cost than fully connected baselines on MMLU and GSM8K while maintaining or improving accuracy, plus zero-shot transfer from MMLU training to GPQA and GSM8K.

// ANALYSIS

This is a strong paper because it attacks the real bottleneck in multi-agent systems: most debate frameworks waste tokens by treating every agent connection as equally valuable.

  • RUMAD uses a PPO-trained controller to adjust edge weights round by round, so agents only exchange information when it is actually useful
  • The content-agnostic controller is a smart design choice because it avoids injecting a privileged judge model and keeps coordination separate from reasoning
  • The biggest practical win is cost efficiency: 68% on MMLU at 11.4k tokens versus 49% for full MAD at 62.6k, with similarly large savings on GSM8K
  • Zero-shot transfer from MMLU to GPQA and GSM8K suggests the learned communication policy is more general than a benchmark-specific prompt hack
  • For agent builders, the paper makes a persuasive case that topology control and agent activation matter as much as model choice when scaling multi-agent reasoning
// TAGS
rumadagentreasoningllmresearch

DISCOVERED

37d ago

2026-03-06

PUBLISHED

37d ago

2026-03-06

RELEVANCE

8/ 10

AUTHOR

Discover AI