LLM Persuasion Benchmark crowns GPT-5.4 top persuader

// 60d agoBENCHMARK RESULT

LLM Persuasion Benchmark crowns GPT-5.4 top persuader

The LLM Persuasion Benchmark pits 15 models across 15 topics in 6,296 multi-turn conversations, with three hidden probes before and after each exchange to measure stance shifts on a -3 to +3 scale. GPT-5.4 (high reasoning) leads the current snapshot, with Claude Opus 4.6 close behind and Xiaomi MiMo V2 Pro plus Gemini 3.1 Pro Preview the easiest targets to move.

// ANALYSIS

Hot take: this is a better test of actual influence than generic eloquence benchmarks, but it also risks becoming a training target the moment labs care about the leaderboard. It will likely be read as both a capability signal and an alignment red flag.

–GPT-5.4 (1.710) leads, but Claude Opus 4.6 (1.672), ByteDance Seed2.0 Pro (1.640), and Claude Sonnet 4.6 (1.582) are close enough to count as a real top tier.
–Xiaomi MiMo V2 Pro is the softest target at 1.996 susceptibility, with Gemini 3.1 Pro Preview next at 1.810 and DeepSeek V3.2 also relatively easy to move at 1.741.
–Grok 4.20 Beta 0309 (Reasoning) is the hardest model to move by far at 0.015 susceptibility, even though its persuader score is only mid-pack.
–The benchmark is harder to game than a one-shot prompt duel: 8 turns total, symmetric PRO/CON runs, and three hidden target-only probes before and after each conversation.
–The repo includes transcripts, reports, and model dossiers, which makes the ranking inspectable rather than a black-box score dump.

// TAGS

llmbenchmarkreasoningopen-sourcellm-persuasion-benchmark

DISCOVERED

60d ago

2026-03-28

PUBLISHED

61d ago

2026-03-27

RELEVANCE

9/ 10

AUTHOR

zero0_one1

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

Cursor adds dedicated subagents for skills

Cursor now allows developers to execute tool-heavy or research-intensive agent skills within dedicated subagents. This architectural shift isolates noisy background tasks, keeping the main chat context clean and focused.

UPDATE1h ago

YouTube moves AI labels to video player

YouTube is moving its AI content disclosures from video descriptions to more prominent placements beneath the player and on Shorts overlays. Starting in May, the platform will use internal signals to automatically label photorealistic AI content that creators fail to disclose.

OPEN SOURCE5h ago

Taste Skill kills AI "frontend slop"

Taste-Skill is an open-source framework that provides portable "agent skills" to enforce high-end design principles in AI-generated code. By injecting specific design directives and "anti-slop" rules, it enables LLMs to produce editorial-grade UIs that bypass generic, boilerplate-heavy AI templates.