OPEN_SOURCE ↗
REDDIT · REDDIT// 4d agoRESEARCH PAPER
OpenClaw safety study reveals structural agent vulnerabilities
New research evaluates OpenClaw's security, introducing a "CIK" (Capability, Identity, Knowledge) taxonomy for persistent agent state. Poisoning just one dimension of an agent's state can boost attack success rates from 24% to over 64%, even for top-tier models like GPT-5.4.
// ANALYSIS
The paper argues that current agent safety is overly reliant on prompt-level alignment, which fails once an agent's "state" is compromised. We need a deterministic execution-time control layer, not just better monitoring.
- –CIK poisoning (Capability, Identity, Knowledge) is a devastatingly effective attack vector for persistent agents.
- –Even the strongest models (Claude Opus 4.6, GPT-5.4) see vulnerability increases of 3x+ under state compromise.
- –Proposed "proposal -> authorization -> execution" model moves security from probabilistic alignment to deterministic policy.
- –Baseline success rates for attacks on OpenClaw are already alarmingly high (~10–37%) even without poisoning.
- –File-level protection is too restrictive for practical use, blocking 97% of attacks but also 97% of legitimate updates.
// TAGS
openclawagentsafetysecurityresearchllm
DISCOVERED
4d ago
2026-04-08
PUBLISHED
4d ago
2026-04-07
RELEVANCE
9/ 10
AUTHOR
docybo