OPEN_SOURCE ↗
X · X// 6h agoINFRASTRUCTURE
DigitalOcean tops DeepSeek, Qwen inference charts
DigitalOcean says its Serverless Inference platform now serves DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B at the top of Artificial Analysis speed charts, with DeepSeek V3.2 hitting 230 output tokens per second and 0.96s TTFT on 10K-token prompts. The post frames this as a GPU- and serving-stack optimization win on NVIDIA Blackwell Ultra, not a new model release.
// ANALYSIS
This is an infrastructure flex, not a model breakthrough: DigitalOcean is trying to prove it can turn commodity open-weight models into low-latency production primitives.
- –The interesting part is the stack work, not the headline number: Blackwell Ultra GPUs, NVFP4 quantization, speculative decoding, and vLLM tuning all contribute
- –230 tok/s plus sub-1s TTFT is the kind of profile that matters for agent loops, copilots, and chat UX more than raw benchmark vanity
- –DigitalOcean is positioning itself against hyperscalers and specialist inference vendors on performance, which raises the bar for what “simple cloud” needs to mean in AI
- –The caveat is obvious: these are vendor-published benchmark results on specific models and prompt sizes, so production performance will depend on workload shape and concurrency
// TAGS
digitaloceaninferencegpubenchmarkllmapi
DISCOVERED
6h ago
2026-05-01
PUBLISHED
6h ago
2026-04-30
RELEVANCE
8/ 10
AUTHOR
digitalocean