Elephant Alpha lands mixed EQBench v3 scores

// 102d agoBENCHMARK RESULT

Elephant Alpha lands mixed EQBench v3 scores

A local EQBench v3 run places Elephant Alpha in the middle of the pack: strong on analytic and moral subtests, weak on the human-facing slice. It sits around GPT-4.5-preview and o4-mini overall, and only a hair above DeepSeek-V3-0324.

// ANALYSIS

My read is that Elephant Alpha looks like a sharp, structured model rather than a naturally warm one: it can reason through emotionally loaded prompts, but it does not turn that into standout human rapport.

–The 8.2 analytic score is the clearest signal here; the model seems better at parsing scenarios than performing empathy theater.
–The 4.3 human score, paired with a 2.6 sycophancy note, suggests it resists glazing but may feel flatter than chat-first models.
–A 5.4 moral score is respectable for a stealth model and keeps it from looking one-dimensional.
–The gap to DeepSeek-V3-0324 is small enough that this reads as incremental progress, not a leaderboard breakout.
–EQ-Bench v3 is Opus-judged and Elo-based, so the absolute numbers matter less than the shape of the profile.

// TAGS

elephant-alphallmbenchmarkreasoningethicssafety

DISCOVERED

102d ago

2026-04-16

PUBLISHED

103d ago

2026-04-16

RELEVANCE

8/ 10

AUTHOR

nivvis

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

SECURITY39m ago

Search engine crawlers index shared Claude chats

Anthropic's Claude chatbot features a share URL capability for conversation snapshots, which search engine crawlers subsequently discovered and indexed into public search results. Users can review and revoke active shared chat links by navigating to Settings > Privacy > Shared Chats in their Claude account.

MODEL42m ago

Anthropic red teams Fable 5.1 for August release

Anthropic has deployed Fable 5.1 into its red teaming portal for beta stress-testing ahead of an expected public launch. The new model aims to succeed Fable 5 with updated capabilities and performance enhancements, following recent pricing adjustments across Anthropic's model lineup.

MODEL42m ago

Gemini 4 pre-training checkpoints hit LMSYS Arena

Initial pre-training checkpoints for Google's Gemini 4 model family have surfaced on LMSYS Arena for blind benchmarking. Early demonstrations highlight substantial rendering improvements for complex 3D WebGL simulations compared to Gemini 3.6 Flash.