X · X// 6h agoINFRASTRUCTURE

DigitalOcean tops DeepSeek, Qwen inference charts

DigitalOcean says its Serverless Inference platform now serves DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B at the top of Artificial Analysis speed charts, with DeepSeek V3.2 hitting 230 output tokens per second and 0.96s TTFT on 10K-token prompts. The post frames this as a GPU- and serving-stack optimization win on NVIDIA Blackwell Ultra, not a new model release.

// ANALYSIS

This is an infrastructure flex, not a model breakthrough: DigitalOcean is trying to prove it can turn commodity open-weight models into low-latency production primitives.

–The interesting part is the stack work, not the headline number: Blackwell Ultra GPUs, NVFP4 quantization, speculative decoding, and vLLM tuning all contribute
–230 tok/s plus sub-1s TTFT is the kind of profile that matters for agent loops, copilots, and chat UX more than raw benchmark vanity
–DigitalOcean is positioning itself against hyperscalers and specialist inference vendors on performance, which raises the bar for what “simple cloud” needs to mean in AI
–The caveat is obvious: these are vendor-published benchmark results on specific models and prompt sizes, so production performance will depend on workload shape and concurrency

// TAGS

digitaloceaninferencegpubenchmarkllmapi

DISCOVERED

6h ago

2026-05-01

PUBLISHED

6h ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

digitalocean