OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoRESEARCH PAPER
ProAttack backdoor nears 100% with few samples
ProAttack is a clean-label, prompt-based backdoor attack that turns the prompt itself into the trigger instead of relying on obvious poisoned tokens or flipped labels. The researchers report near-100% attack success across multiple text-classification benchmarks, sometimes with as few as six poisoned samples.
// ANALYSIS
This is a sobering reminder that prompt engineering is becoming part of the security supply chain, not just the UI layer. The attack is cheap, stealthy, and effective enough that most current defenses look more like speed bumps than a stop sign.
- –Clean-label poisoning is harder to spot because the labels stay correct and the text still reads naturally.
- –The method held near-100% attack success across five datasets and five language models, and it also carried over to radiology report summarization.
- –Defenses like ONION, SCPD, back-translation, and fine-pruning helped inconsistently, and some of them hurt clean accuracy.
- –LoRA-style low-rank fine-tuning reduced attack success, but the defense depends on keeping rank low and tuning it per task.
- –Any workflow reusing shared prompt templates or synthetic data should treat prompt provenance as a real security control.
// TAGS
proattackllmprompt-engineeringsafetyresearchbenchmark
DISCOVERED
17d ago
2026-03-26
PUBLISHED
17d ago
2026-03-26
RELEVANCE
8/ 10
AUTHOR
tekz