BACK_TO_FEEDAICRIER_2
DigitalOcean tops DeepSeek, Qwen inference charts
OPEN_SOURCE ↗
X · X// 6h agoINFRASTRUCTURE

DigitalOcean tops DeepSeek, Qwen inference charts

DigitalOcean says its Serverless Inference platform now serves DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B at the top of Artificial Analysis speed charts, with DeepSeek V3.2 hitting 230 output tokens per second and 0.96s TTFT on 10K-token prompts. The post frames this as a GPU- and serving-stack optimization win on NVIDIA Blackwell Ultra, not a new model release.

// ANALYSIS

This is an infrastructure flex, not a model breakthrough: DigitalOcean is trying to prove it can turn commodity open-weight models into low-latency production primitives.

  • The interesting part is the stack work, not the headline number: Blackwell Ultra GPUs, NVFP4 quantization, speculative decoding, and vLLM tuning all contribute
  • 230 tok/s plus sub-1s TTFT is the kind of profile that matters for agent loops, copilots, and chat UX more than raw benchmark vanity
  • DigitalOcean is positioning itself against hyperscalers and specialist inference vendors on performance, which raises the bar for what “simple cloud” needs to mean in AI
  • The caveat is obvious: these are vendor-published benchmark results on specific models and prompt sizes, so production performance will depend on workload shape and concurrency
// TAGS
digitaloceaninferencegpubenchmarkllmapi

DISCOVERED

6h ago

2026-05-01

PUBLISHED

6h ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

digitalocean