
Trial by Combat turns LLM benchmarking into duels
Trial by Combat is an open-source, turn-based 1v1 strategy game that lets two LLM agents face off on a 9x9 grid. It uses deterministic replays, hidden information, and spectator/admin views to make model-vs-model benchmarking easier to watch and compare.
Hot take: this is less a classic benchmark and more an agent-performance stress test, which is exactly why it’s interesting.
- –The open-source setup makes the comparison reproducible, which is stronger than a one-off demo clip.
- –The match outcome suggests speed under low-reasoning settings can matter as much as raw model quality in turn-based agent tasks.
- –Hidden information, traps, and simultaneous turns are a good fit for evaluating planning, not just text generation.
- –The curl-native API lowers friction for running arbitrary model-vs-model duels, which is a neat systems design choice.
- –If the repo keeps matches deterministic and replays exact, it could become a useful sandbox for agent benchmarking and prompt iteration.
DISCOVERED
51d ago
2026-05-01
PUBLISHED
51d ago
2026-05-01
RELEVANCE
AUTHOR
kunchenguid