BACK_TO_FEEDAICRIER_2
Trial by Combat turns LLM benchmarking into duels
OPEN_SOURCE ↗
X · X// 6h agoBENCHMARK RESULT

Trial by Combat turns LLM benchmarking into duels

Trial by Combat is an open-source, turn-based 1v1 strategy game that lets two LLM agents face off on a 9x9 grid. It uses deterministic replays, hidden information, and spectator/admin views to make model-vs-model benchmarking easier to watch and compare.

// ANALYSIS

Hot take: this is less a classic benchmark and more an agent-performance stress test, which is exactly why it’s interesting.

  • The open-source setup makes the comparison reproducible, which is stronger than a one-off demo clip.
  • The match outcome suggests speed under low-reasoning settings can matter as much as raw model quality in turn-based agent tasks.
  • Hidden information, traps, and simultaneous turns are a good fit for evaluating planning, not just text generation.
  • The curl-native API lowers friction for running arbitrary model-vs-model duels, which is a neat systems design choice.
  • If the repo keeps matches deterministic and replays exact, it could become a useful sandbox for agent benchmarking and prompt iteration.
// TAGS
llmbenchmarkopen-sourceagentsturn-based-strategyhidden-informationcurlgpt-5.5opus-4.7

DISCOVERED

6h ago

2026-05-01

PUBLISHED

6h ago

2026-05-01

RELEVANCE

8/ 10

AUTHOR

kunchenguid