OPEN_SOURCE ↗
REDDIT · REDDIT// 14d agoBENCHMARK RESULT
LLM Persuasion Benchmark crowns GPT-5.4 top persuader
The LLM Persuasion Benchmark pits 15 models across 15 topics in 6,296 multi-turn conversations, with three hidden probes before and after each exchange to measure stance shifts on a -3 to +3 scale. GPT-5.4 (high reasoning) leads the current snapshot, with Claude Opus 4.6 close behind and Xiaomi MiMo V2 Pro plus Gemini 3.1 Pro Preview the easiest targets to move.
// ANALYSIS
Hot take: this is a better test of actual influence than generic eloquence benchmarks, but it also risks becoming a training target the moment labs care about the leaderboard. It will likely be read as both a capability signal and an alignment red flag.
- –GPT-5.4 (1.710) leads, but Claude Opus 4.6 (1.672), ByteDance Seed2.0 Pro (1.640), and Claude Sonnet 4.6 (1.582) are close enough to count as a real top tier.
- –Xiaomi MiMo V2 Pro is the softest target at 1.996 susceptibility, with Gemini 3.1 Pro Preview next at 1.810 and DeepSeek V3.2 also relatively easy to move at 1.741.
- –Grok 4.20 Beta 0309 (Reasoning) is the hardest model to move by far at 0.015 susceptibility, even though its persuader score is only mid-pack.
- –The benchmark is harder to game than a one-shot prompt duel: 8 turns total, symmetric PRO/CON runs, and three hidden target-only probes before and after each conversation.
- –The repo includes transcripts, reports, and model dossiers, which makes the ranking inspectable rather than a black-box score dump.
// TAGS
llmbenchmarkreasoningopen-sourcellm-persuasion-benchmark
DISCOVERED
14d ago
2026-03-28
PUBLISHED
15d ago
2026-03-27
RELEVANCE
9/ 10
AUTHOR
zero0_one1