Hostile prompts cut LLM performance across scales

// 50d agoRESEARCH PAPER

Hostile prompts cut LLM performance across scales

New research across 14 model configurations reveals a 5-13% drop in instruction-following performance when models face hostile user prompts. This "hostility residual" persists from 0.6B to 123B parameters, suggesting that scaling alone cannot solve model sensitivity to aggressive prompt framing.

// ANALYSIS

Scaling isn't the silver bullet for model robustness; if your user is mean, your model is likely to fail.

–The effect is universal across architecture (Dense vs MoE) and quantization (FP16 vs Q4), indicating it is a fundamental property of current LLM training paradigms.
–Larger models like Mistral Large 123B show attenuation but remain significantly vulnerable, debunking the idea that simply adding parameters cures sensitivity.
–Instruction tuning actually amplifies hostility sensitivity in models like Llama 3.1, raising questions about how RLHF and safety training impact behavioral stability.
–The emergence of extreme position bias in specific configurations (like Mistral 7B Q4) under hostile framing suggests quantization can cause unpredictable distributional collapses.

// TAGS

llmresearchbenchmarksafetyprompt-engineeringhostility-residual

DISCOVERED

50d ago

2026-04-24

PUBLISHED

50d ago

2026-04-24

RELEVANCE

9/ 10

AUTHOR

Saraozte01

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL12m ago

Moonshot AI has officially released Kimi K2.7-Code, an open-weights coding model optimized for long-horizon software engineering and cost-efficient agentic reasoning.

Moonshot AI has officially released Kimi K2.7-Code, an open-weights Mixture-of-Experts coding model featuring 1 trillion parameters and a 256K context window. Optimized for long-horizon software engineering tasks like codebase-wide refactoring and debugging, the model achieves a 30% reduction in reasoning-token usage compared to its predecessor. Kimi K2.7-Code supports multimodal inputs, runs in a dedicated reasoning-heavy thinking mode, and is available for developers via Hugging Face, Ollama, and the Kimi API.

TUTORIAL44m ago

Seedance 2.0 workflow animates consistent characters

AI creator Aimi Kōda shared a step-by-step generative AI workflow titled "Surf on the Clouds" that coordinates Midjourney, GPT Image 2, and Seedance 2.0. The tutorial explains how to generate a stylized character in Midjourney, build a structured 16:9 character identity sheet using GPT Image 2, and animate the assets using Seedance 2.0 to maintain visual and narrative consistency across scenes.

MODEL1h ago

Claude Fable 5 overshadows Claude Opus 4.8

The rapid succession of Anthropic's model releases has left Claude Opus 4.8—which debuted just two weeks ago as a major frontier model—largely forgotten in the wake of Claude Fable 5. Fable 5's introduction as the first generally available 'Mythos-class' model has generated massive hype due to its superior score of 80.3% on SWE-bench Pro and impressive multi-step autonomous planning, completely shifting the AI community's focus and discussions away from the incremental updates of Opus 4.8.

Hostile prompts cut LLM performance across scales