BACK_TO_FEEDAICRIER_2
Kimi K2.6 hits 264 t/s on MI50s
OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoBENCHMARK RESULT

Kimi K2.6 hits 264 t/s on MI50s

A custom vllm-gfx906-mobydick stack on 32 AMD MI50 32GB cards reached 9.7 tok/s output and 263 tok/s prefill on moonshotai/Kimi-K2.6. The run spans two 16-GPU nodes and the author says it is power-hungry, bandwidth-limited, and still not fully optimized.

// ANALYSIS

This is a strong systems benchmark, but it reads more like a proof-of-concept for keeping old Instinct hardware relevant than a practical deployment recipe.

  • The result is split between generation throughput and prefill throughput, so it favors batch ingest and long-context work more than low-latency chat.
  • PCIe instability and only partial bandwidth on the risers likely left performance on the table; the author explicitly expects better numbers with cleaner links.
  • The real story is the AMD ROCm/vLLM fork: `vllm-gfx906-mobydick` is doing the heavy lifting to make gfx906-era cards useful for modern MoE models.
  • Kimi K2.6 is a sensible stress test here because it ships as an open-weight model with native INT4 support and is meant to run on vLLM-class stacks.
  • For most teams, this is a curiosity unless power is cheap; for tinkerers with used MI50s, it is a useful datapoint on the ceiling of self-hosted inference.
// TAGS
kimi-k2-6benchmarkinferencegpuself-hostedopen-sourcellm

DISCOVERED

6h ago

2026-05-01

PUBLISHED

8h ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

ai-infos