BACK_TO_FEEDAICRIER_2
OpenClaw safety study reveals structural agent vulnerabilities
OPEN_SOURCE ↗
REDDIT · REDDIT// 4d agoRESEARCH PAPER

OpenClaw safety study reveals structural agent vulnerabilities

New research evaluates OpenClaw's security, introducing a "CIK" (Capability, Identity, Knowledge) taxonomy for persistent agent state. Poisoning just one dimension of an agent's state can boost attack success rates from 24% to over 64%, even for top-tier models like GPT-5.4.

// ANALYSIS

The paper argues that current agent safety is overly reliant on prompt-level alignment, which fails once an agent's "state" is compromised. We need a deterministic execution-time control layer, not just better monitoring.

  • CIK poisoning (Capability, Identity, Knowledge) is a devastatingly effective attack vector for persistent agents.
  • Even the strongest models (Claude Opus 4.6, GPT-5.4) see vulnerability increases of 3x+ under state compromise.
  • Proposed "proposal -> authorization -> execution" model moves security from probabilistic alignment to deterministic policy.
  • Baseline success rates for attacks on OpenClaw are already alarmingly high (~10–37%) even without poisoning.
  • File-level protection is too restrictive for practical use, blocking 97% of attacks but also 97% of legitimate updates.
// TAGS
openclawagentsafetysecurityresearchllm

DISCOVERED

4d ago

2026-04-08

PUBLISHED

4d ago

2026-04-07

RELEVANCE

9/ 10

AUTHOR

docybo