DeepSeek open-sources DSpark speculative decoding framework
DeepSeek has open-sourced DSpark, a confidence-scheduled speculative decoding framework, along with its training and evaluation codebase, DeepSpec. Deployed in production for DeepSeek-V4 (Flash and Pro), DSpark utilizes a semi-autoregressive architecture to accelerate LLM generation speeds by 60% to 85%.
Speculative decoding is graduating from academic theory to a core production requirement for web-scale LLM serving, with DeepSeek proving that semi-autoregressive draft models can mitigate traditional acceptance rate degradation.
- –Semi-Autoregressive Drafts: The combination of parallel-only draft generation and a lightweight serial model effectively preserves token dependency modeling.
- –Real-World Validation: Live production deployment on DeepSeek-V4 (Flash and Pro) shows 60-85% speedups without degradation in quality or throughput.
- –Full Stack Codebase: Open-sourcing the DeepSpec training and evaluation framework enables others to build and optimize their own draft models.
DISCOVERED
2h ago
2026-06-27
PUBLISHED
5h ago
2026-06-27
RELEVANCE
AUTHOR
aurenvale