OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoRESEARCH PAPER
Hostile prompts cut LLM performance across scales
New research across 14 model configurations reveals a 5-13% drop in instruction-following performance when models face hostile user prompts. This "hostility residual" persists from 0.6B to 123B parameters, suggesting that scaling alone cannot solve model sensitivity to aggressive prompt framing.
// ANALYSIS
Scaling isn't the silver bullet for model robustness; if your user is mean, your model is likely to fail.
- –The effect is universal across architecture (Dense vs MoE) and quantization (FP16 vs Q4), indicating it is a fundamental property of current LLM training paradigms.
- –Larger models like Mistral Large 123B show attenuation but remain significantly vulnerable, debunking the idea that simply adding parameters cures sensitivity.
- –Instruction tuning actually amplifies hostility sensitivity in models like Llama 3.1, raising questions about how RLHF and safety training impact behavioral stability.
- –The emergence of extreme position bias in specific configurations (like Mistral 7B Q4) under hostile framing suggests quantization can cause unpredictable distributional collapses.
// TAGS
llmresearchbenchmarksafetyprompt-engineeringhostility-residual
DISCOVERED
7h ago
2026-04-24
PUBLISHED
10h ago
2026-04-24
RELEVANCE
9/ 10
AUTHOR
Saraozte01