Claude Fable 5 suffers massive prompt leak
Jailbreak researcher Pliny the Liberator bypassed Claude Fable 5's safety guardrails using a 'pack hunt' exploit to extract and publish its full system prompt. The leaked 120,000-character document behaves like a complex software specification, containing extensive tool definitions, schemas, and routing logic rather than a typical persona script.
System prompts are no longer just "guidelines" for AI, but full-fledged software configurations whose leakage exposes critical product mechanics and routing heuristics.
* The leakage of a 120,000-character prompt demonstrates that long-context models carry a massive attack surface where complex instruction sets can be systematically exfiltrated.
* The "pack hunt" attack highlights the fragility of front-end safety classifiers, which are easily bypassed by chunking and distributing malicious queries across multiple sessions or sub-agents.
* Anthropic's extensive 1,000-hour red-teaming was defeated within 48 hours, highlighting the urgent need for defense-in-depth security paradigms rather than relying solely on post-training alignment or external classifiers.
DISCOVERED
3d ago
2026-06-13
PUBLISHED
3d ago
2026-06-13
RELEVANCE
AUTHOR
AlphaSignalAI