Claude Fable 5 Refuses All ProgramBench Tasks

// 45d agoBENCHMARK RESULT

Claude Fable 5 Refuses All ProgramBench Tasks

Anthropic's Claude Fable 5 model achieved a 100% refusal rate on the 200 tasks in the ProgramBench coding benchmark. Strict cyber-safety guardrails flagged the program reconstruction tasks as security risks, preventing execution despite strong performance on general coding benchmarks like SWE-bench Pro.

// ANALYSIS

When safety guardrails are so sensitive that a model refuses harmless benchmarking tasks, safety has officially compromised utility.

–Anthropic's protective guardrails have over-indexed on security risk detection, creating a false-positive scenario for binary manipulation and program reconstruction tasks.
–This highlights a growing tension in AI development between achieving state-of-the-art programming capabilities and maintaining strict alignment filters.
–For security-adjacent developers, Fable 5 represents a step backward in usability unless Anthropic provides configuration toggles or API parameters to dial down safety sensitivity.

// TAGS

claude-fable-5programbenchsafetybenchmarkanthropicrefusalsllm-coding

DISCOVERED

45d ago

2026-06-12

PUBLISHED

45d ago

2026-06-12

RELEVANCE

8/ 10

AUTHOR

steipete

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

LAUNCH1h ago

Focusa launches mission control runtime for AI agents

Focusa (@focusa_dev) is an AI agent mission-control layer and Workpoint workflow runtime built by Verious Smith III to solve context loss and session failures in multi-step AI tasks. Unlike basic chat interfaces, Focusa maintains persistent session state, trajectory, evidence, and decisions across long-running agent workflows and model switches, providing AI operators with a durable, dependable environment for real-world automation.

UPDATE2h ago

Augment integrates Moonshot AI's Kimi K3 into Cosmos

Augment announced the integration of Moonshot AI's Kimi K3 open-source model into Cosmos, its agent orchestration platform. Highlighted by Augment as the most capable open-source model they have tested to date, Kimi K3 is now available within Cosmos to power developer agent workflows and multi-agent coordination.

UPDATE2h ago

Open Science v0.7.3 enhances long-running research workflows

AIPOCH has announced the release of Open Science version 0.7.3, an update focused on enabling complex and long-running AI research workflows. As AI agents move beyond short experiments toward extended research tasks, this release equips the workbench to handle larger scientific files, manage longer context demands, and provide a smoother workspace environment.