OPEN_SOURCE ↗
REDDIT · REDDIT// 11d agoOPENSOURCE RELEASE
Together AI drops Aurora for adaptive speculative training
Together AI's Aurora is an open-source reinforcement learning framework that continuously updates speculative decoding draft models during live inference. By creating a "serve-to-train" flywheel, it eliminates the "stale model" problem and delivers up to 1.5x speedups on frontier models without expensive offline pretraining or distillation pipelines.
// ANALYSIS
Aurora is a major step toward self-improving inference stacks that adapt to real-world traffic patterns in real-time. It signals the end of the static "train-then-serve" era for speculative decoding.
- –Decouples training and inference servers to allow "hot-swapping" model weights without service downtime
- –Achieves 1.25x-1.5x performance gains even when starting from an untrained "Day-0" speculator
- –Uses a novel Tree Attention mechanism to process both accepted and rejected token branches in a single efficient pass
- –Built on SGLang and open-sourced, providing a blueprint for the next generation of adaptive inference
// TAGS
auroratogether-aillminferenceopen-sourceresearchsglang
DISCOVERED
11d ago
2026-04-01
PUBLISHED
11d ago
2026-03-31
RELEVANCE
9/ 10
AUTHOR
incarnadine72