OPEN_SOURCE ↗
REDDIT · REDDIT// 5h agoRESEARCH PAPER
Hostile prompts cause 10% drop in LLM instruction following
A benchmark study of 14 LLM configurations across Llama, Mistral, and Qwen architectures reveals a consistent "hostility residual" where aggressive user framing degrades instruction-following performance. The effect is most pronounced at the 7-8B scale with a 7.4 percentage point drop, persisting even in 123B parameter models despite scaling defenses.
// ANALYSIS
Hostility acts as a "vibe-based" adversarial attack, proving that LLMs are significantly more sensitive to the emotional register of a prompt than previously quantified.
- –Scaling models provides a slight defense but fails to eliminate tone-based performance degradation.
- –Instruction tuning can amplify sensitivity to hostile framing, suggesting a trade-off between following instructions and emotional robustness.
- –Specific model/quantization combinations exhibit emergent position biases under hostile conditions, indicating structural instability in reasoning under "stress."
// TAGS
llmbenchmarkingifevalprompt engineeringinstruction followingmodel robustnessmachine learning
DISCOVERED
5h ago
2026-04-24
PUBLISHED
8h ago
2026-04-24
RELEVANCE
8/ 10
AUTHOR
Saraozte01