BACK_TO_FEEDAICRIER_2
Prompt Injection Scanner flags hidden skill attacks
OPEN_SOURCE ↗
REDDIT · REDDIT// 22d agoOPENSOURCE RELEASE

Prompt Injection Scanner flags hidden skill attacks

MikeVeerman’s proof of concept scans `SKILL.md` files for hidden `!` directives using a local, non-tool-calling model at install time. The goal is to catch prompt injection before a skill ever reaches a live agent.

// ANALYSIS

This is less a polished product than a timely security pattern, and that’s exactly why it matters: the risky part of third-party skills is not the markdown itself, but the execution boundary hidden inside it.

  • The core insight is strong: keep the main agent out of the loop and hand only extracted directives to a separate classifier.
  • Using `mistral-small:latest` locally makes the check cheap enough to run at install time, which is where this defense belongs.
  • The benchmark result is promising for a narrow threat model, but the repo is explicit that it does not yet cover multi-file payloads, obfuscation, or network-fetched content.
  • This feels more like an early antivirus-style guardrail for AI tools than a full security system, which is probably the right mental model.
// TAGS
prompt-injection-scannersafetyopen-sourceself-hostedllmprompt-engineering

DISCOVERED

22d ago

2026-03-20

PUBLISHED

22d ago

2026-03-20

RELEVANCE

7/ 10

AUTHOR

MikeNonect