Together AI drops Aurora for adaptive speculative training

// 102d agoOPENSOURCE RELEASE

Together AI drops Aurora for adaptive speculative training

Together AI's Aurora is an open-source reinforcement learning framework that continuously updates speculative decoding draft models during live inference. By creating a "serve-to-train" flywheel, it eliminates the "stale model" problem and delivers up to 1.5x speedups on frontier models without expensive offline pretraining or distillation pipelines.

// ANALYSIS

Aurora is a major step toward self-improving inference stacks that adapt to real-world traffic patterns in real-time. It signals the end of the static "train-then-serve" era for speculative decoding.

–Decouples training and inference servers to allow "hot-swapping" model weights without service downtime
–Achieves 1.25x-1.5x performance gains even when starting from an untrained "Day-0" speculator
–Uses a novel Tree Attention mechanism to process both accepted and rejected token branches in a single efficient pass
–Built on SGLang and open-sourced, providing a blueprint for the next generation of adaptive inference

// TAGS

auroratogether-aillminferenceopen-sourceresearchsglang

DISCOVERED

102d ago

2026-04-01

PUBLISHED

102d ago

2026-03-31

RELEVANCE

9/ 10

AUTHOR

incarnadine72

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL1h ago

Qwythos-9B v2 fixes LLM repetition loops

Empero AI has launched the v2 hygiene release of Qwythos-9B, an open-source, 9-billion parameter reasoning model built on an uncensored Qwen3.5 base. This update addresses common local LLM repetition and tool-calling issues by employing Final-Token Preference Optimization to eliminate decoding loops under greedy settings and restoring the native multi-token prediction head.

OPEN SOURCE4h ago

meshoptimizer is an open-source C/C++ library that optimizes 3D triangle meshes to reduce file sizes and accelerate GPU rendering performance.

meshoptimizer is a high-performance C/C++ library designed to optimize 3D meshes for faster rendering and smaller file sizes. Developed by Arseny Kapoulkine, it provides a comprehensive suite of algorithms for vertex cache optimization, vertex fetch optimization, overdraw reduction, mesh simplification (Level of Detail), and data compression. The project includes gltfpack, an opinionated tool for optimizing glTF scenes, along with WebAssembly and JavaScript bindings for web applications, making it a staple in graphics pipelines and game engines.

UPDATE4h ago

Abacus AI integrates Supercomputer with agentic workflows

Abacus AI has integrated its Supercomputer with agentic workflows in Max Mode, giving LLMs like Fable 5 root access to a persistent Linux environment to execute, debug, and host full-stack applications autonomously.