ProAttack backdoor nears 100% with few samples

// 109d agoRESEARCH PAPER

ProAttack backdoor nears 100% with few samples

ProAttack is a clean-label, prompt-based backdoor attack that turns the prompt itself into the trigger instead of relying on obvious poisoned tokens or flipped labels. The researchers report near-100% attack success across multiple text-classification benchmarks, sometimes with as few as six poisoned samples.

// ANALYSIS

This is a sobering reminder that prompt engineering is becoming part of the security supply chain, not just the UI layer. The attack is cheap, stealthy, and effective enough that most current defenses look more like speed bumps than a stop sign.

–Clean-label poisoning is harder to spot because the labels stay correct and the text still reads naturally.
–The method held near-100% attack success across five datasets and five language models, and it also carried over to radiology report summarization.
–Defenses like ONION, SCPD, back-translation, and fine-pruning helped inconsistently, and some of them hurt clean accuracy.
–LoRA-style low-rank fine-tuning reduced attack success, but the defense depends on keeping rank low and tuning it per task.
–Any workflow reusing shared prompt templates or synthetic data should treat prompt provenance as a real security control.

// TAGS

proattackllmprompt-engineeringsafetyresearchbenchmark

DISCOVERED

109d ago

2026-03-26

PUBLISHED

109d ago

2026-03-26

RELEVANCE

8/ 10

AUTHOR

tekz

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS14m ago

swyx outlines specialized multi-model AI workflow

In a recent tweet, swyx shared his multi-model AI stack for complex projects, assigning specialized tasks to models like sol ultra for planning, fable 5 for critiquing, and sonnet 5 for code generation. He also highlighted the importance of interactive, interview-style prompting to clarify design decisions.

NEWS17m ago

Tweet mocks Claude Fable 5 safety filters

Indie developer Pieter Levels (@levelsio) shared a post mocking the overly sensitive safety guardrails of Anthropic's Claude Fable 5 AI model. The message satirizes Fable's warning system by claiming a 'life simulation' was downgraded to Opus 4.5 without appeal, highlighting developer frustration with aggressive safety routing.

LAUNCH43m ago

Brockman highlights ChatGPT Work mobile experience

OpenAI President and Co-founder Greg Brockman shared his enthusiasm for ChatGPT Work, noting that while the new agent-based platform has received less attention than other recent updates, it offers a highly functional and impressive mobile experience. Powered by the GPT-5.6 model family, ChatGPT Work transitions ChatGPT from a conversational chatbot into an autonomous agent capable of executing complex, multi-step workflows and cross-app integrations directly from mobile and desktop interfaces.