OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoBENCHMARK RESULT
Gemini tops Grok in Hanoi code duel
An adversarial Towers of Hanoi coding contest found Gemini Pro 3.1 was the only model that consistently solved the deeper game, beating every responsive rival and posting a perfect 5-0 matchup record. Grok Expert 4.2 finished in the second tier with Kimi, strong enough to crush broken bots but unable to win hero-side games against a functioning opponent.
// ANALYSIS
This is less a vibe-check than a brutal test of whether an LLM can turn reasoning into working competitive code under protocol, time, and search-pressure constraints. Gemini won because it actually modeled the game; most of the field lost on plumbing bugs or shallow heuristics before strategy even mattered.
- –Gemini was the only bot in the writeup to run real game-tree search with minimax, alpha-beta pruning, iterative deepening, and cycle detection, which is exactly the kind of structure these adversarial puzzles reward.
- –Grok’s one-ply lookahead tied Kimi statistically and looked respectable on paper, but it could not convert that into hero wins against responsive opponents, which exposes the ceiling of shallow search.
- –ChatGPT, GLM, and Nemotron mostly turned the contest into a protocol reliability benchmark by disconnecting, crashing, or never sending moves, a reminder that generated code still fails at the unglamorous parts first.
- –The broader llmcomp standings show big challenge-to-challenge variance, so the takeaway is not “Gemini best at coding” so much as “Gemini best on this exact search-heavy, adversarial coding task.”
- –For developers, the interesting signal is that explicit search and domain-aware evaluation still beat generic “smart” code generation when the problem has tight move budgets and an actively hostile opponent.
// TAGS
llmcompgeminigrokllmai-codingbenchmarktesting
DISCOVERED
3h ago
2026-04-23
PUBLISHED
6h ago
2026-04-23
RELEVANCE
8/ 10
AUTHOR
reditzer