HN · HACKER_NEWS// 2h agoBENCHMARK RESULT

Kimi K2.6 tops Claude, GPT-5.5, Gemini

Moonshot’s Kimi K2.6 finished first in AICC’s Word Gem Puzzle, a timed coding contest built around sliding-letter word discovery and opponent races. It edged MiMo and outperformed Claude, GPT-5.5, Gemini, Grok, and DeepSeek in a more game-like test of agentic coding than a static benchmark.

// ANALYSIS

This is a useful signal because it rewards search, planning, and execution under pressure, not just benchmark memorization. But it’s still a narrow puzzle environment, so treat the result as evidence of strong agent behavior, not a universal ranking of model quality.

–Kimi won the tournament on match points, with GPT-5.5 third and Claude fifth, which suggests the model is genuinely competitive in timed multi-step tasks
–The contest structure matters: 10-second rounds, 1v1 matches, and changing board sizes favor models that can reason and act quickly
–Kimi’s open-weights status makes the result more interesting for developers, since it can be studied, deployed, and adapted without waiting on a closed API
–The win reinforces Moonshot’s positioning around long-horizon coding and agent orchestration, not just chat or reasoning demos
–Still, a puzzle contest is not production software engineering; the next question is whether these strengths carry over to real repos, flaky tests, and messy toolchains

// TAGS

benchmarkevaluationai-codingcoding-agentopen-weightsllmkimi-k2-6

DISCOVERED

2h ago

2026-05-03

PUBLISHED

6h ago

2026-05-03

RELEVANCE

9/ 10

AUTHOR

bazlightyear