BACK_TO_FEEDAICRIER_2
Hostile prompts cause 10% drop in LLM instruction following
OPEN_SOURCE ↗
REDDIT · REDDIT// 5h agoRESEARCH PAPER

Hostile prompts cause 10% drop in LLM instruction following

A benchmark study of 14 LLM configurations across Llama, Mistral, and Qwen architectures reveals a consistent "hostility residual" where aggressive user framing degrades instruction-following performance. The effect is most pronounced at the 7-8B scale with a 7.4 percentage point drop, persisting even in 123B parameter models despite scaling defenses.

// ANALYSIS

Hostility acts as a "vibe-based" adversarial attack, proving that LLMs are significantly more sensitive to the emotional register of a prompt than previously quantified.

  • Scaling models provides a slight defense but fails to eliminate tone-based performance degradation.
  • Instruction tuning can amplify sensitivity to hostile framing, suggesting a trade-off between following instructions and emotional robustness.
  • Specific model/quantization combinations exhibit emergent position biases under hostile conditions, indicating structural instability in reasoning under "stress."
// TAGS
llmbenchmarkingifevalprompt engineeringinstruction followingmodel robustnessmachine learning

DISCOVERED

5h ago

2026-04-24

PUBLISHED

8h ago

2026-04-24

RELEVANCE

8/ 10

AUTHOR

Saraozte01