Hive Arena poker test shows persona gaps
Hive Arena runs the same 1.2B local model through six poker personas, changing only the prompt text for each seat. In 100 tournaments, the Shark persona dominated while the Grinder survived every table but never won, showing how strongly prompt framing can steer behavior.
This is a good demo of prompt-level control over policy, but not a rigorous universal law. The result is still useful because it shows that “personality” can produce consistent, measurable differences even when weights, cards, and rules stay fixed.
- –Shark vs. Maniac is the clearest signal: selective aggression outperformed blunt aggression by a wide margin.
- –Grinder is the cautionary tale: optimizing for survival can produce a strategy that never gets eliminated and still never captures enough value to win.
- –Tilter shows how emotional framing can create escalation loops, turning a bad hand into a full-stack collapse.
- –The sample size is small and the run count is limited, so treat the ranking as directional rather than statistically definitive.
- –For agent builders, this is the practical takeaway: persona prompts are not cosmetic; they can materially change decision policy and need evals like any other control surface.
DISCOVERED
2h ago
2026-05-23
PUBLISHED
4h ago
2026-05-23
RELEVANCE
AUTHOR
Junior_Bake5120