Prompt Injection Detector hits browser-ready 99% F1
A Hugging Face Space shows a DistilBERT prompt-injection classifier trained with ml-intern and DeepSeek v4 Flash. It ships as a ~65 MB ONNX int8 model and runs in the browser with Transformers.js v3.
The useful signal here is less the headline metric than the workflow: an agent handled dataset discovery, sweeps, and deployment with very little hand-holding. The caveat is obvious too, though, because synthetic data can make F1 look much better than real adversarial traffic.
- –Browser-side inference is the right shape for this kind of filter: low latency, portable, and easy to drop in ahead of more expensive agent calls.
- –DistilBERT plus int8 ONNX is a pragmatic stack for security screening, especially when the goal is a small model that can run anywhere.
- –The reported 99% F1 should be treated cautiously until it is tested on a cleaner, more hostile holdout set.
- –The project is also a quiet endorsement of agent-assisted ML ops: under $5 in API spend for the automation is a decent prototyping cost.
- –The failure on HRM/TRM is the bigger lesson for me: current coding agents still default to familiar HF patterns and are brittle on unusual research architectures.
DISCOVERED
4h ago
2026-05-23
PUBLISHED
16h ago
2026-05-22
RELEVANCE
AUTHOR
Everlier
