OPEN_SOURCE ↗
YT · YOUTUBE// 37d agoRESEARCH PAPER
RUMAD cuts debate token costs with RL
RUMAD is a research paper on multi-agent debate that replaces fixed or fully connected agent communication with an RL controller that dynamically rewires the debate graph. The result is a much cheaper reasoning setup: over 80% lower token cost than fully connected baselines on MMLU and GSM8K while maintaining or improving accuracy, plus zero-shot transfer from MMLU training to GPQA and GSM8K.
// ANALYSIS
This is a strong paper because it attacks the real bottleneck in multi-agent systems: most debate frameworks waste tokens by treating every agent connection as equally valuable.
- –RUMAD uses a PPO-trained controller to adjust edge weights round by round, so agents only exchange information when it is actually useful
- –The content-agnostic controller is a smart design choice because it avoids injecting a privileged judge model and keeps coordination separate from reasoning
- –The biggest practical win is cost efficiency: 68% on MMLU at 11.4k tokens versus 49% for full MAD at 62.6k, with similarly large savings on GSM8K
- –Zero-shot transfer from MMLU to GPQA and GSM8K suggests the learned communication policy is more general than a benchmark-specific prompt hack
- –For agent builders, the paper makes a persuasive case that topology control and agent activation matter as much as model choice when scaling multi-agent reasoning
// TAGS
rumadagentreasoningllmresearch
DISCOVERED
37d ago
2026-03-06
PUBLISHED
37d ago
2026-03-06
RELEVANCE
8/ 10
AUTHOR
Discover AI