Arc Sentry flags Crescendo jailbreaks internally
Arc Sentry by Bendex Geometry is a white-box behavioral guardrail that monitors LLM internal states to block multi-turn attacks like Crescendo. By analyzing shifts in the model's residual stream before output generation, it catches subtle adversarial steering that bypasses standard text-based monitors.
Arc Sentry’s shift from external text monitoring to internal state observation is a breakthrough for defending against sophisticated multi-turn attacks. It successfully flagged the Crescendo multi-turn attack at Turn 3 while LLM Guard failed to detect any of the 8 turns. By hooking directly into model layers to calculate delta shifts, it provides a white-box safety layer that operates before token generation begins. Zero-shot calibration requires only 5 normal prompts, making it accessible for rapid deployment in single-domain enterprise environments. The tool reports a 100% detection rate and 0% false positives on major models like Mistral 7B and Llama 3.1, with Pro pricing at $199/mo.
DISCOVERED
3h ago
2026-04-15
PUBLISHED
6h ago
2026-04-14
RELEVANCE
AUTHOR
Turbulent-Tap6723