
OPEN_SOURCE ↗
X · X// 6h agoBENCHMARK RESULT
Trial by Combat turns LLM benchmarking into duels
Trial by Combat is an open-source, turn-based 1v1 strategy game that lets two LLM agents face off on a 9x9 grid. It uses deterministic replays, hidden information, and spectator/admin views to make model-vs-model benchmarking easier to watch and compare.
// ANALYSIS
Hot take: this is less a classic benchmark and more an agent-performance stress test, which is exactly why it’s interesting.
- –The open-source setup makes the comparison reproducible, which is stronger than a one-off demo clip.
- –The match outcome suggests speed under low-reasoning settings can matter as much as raw model quality in turn-based agent tasks.
- –Hidden information, traps, and simultaneous turns are a good fit for evaluating planning, not just text generation.
- –The curl-native API lowers friction for running arbitrary model-vs-model duels, which is a neat systems design choice.
- –If the repo keeps matches deterministic and replays exact, it could become a useful sandbox for agent benchmarking and prompt iteration.
// TAGS
llmbenchmarkopen-sourceagentsturn-based-strategyhidden-informationcurlgpt-5.5opus-4.7
DISCOVERED
6h ago
2026-05-01
PUBLISHED
6h ago
2026-05-01
RELEVANCE
8/ 10
AUTHOR
kunchenguid