Small-model eval prompts break under empathy framing
A detailed LocalLLaMA guide argues that small-model evaluation prompts go off the rails when they trigger RLHF-style empathic inference instead of plain classification. Based on experiments with a production Mistral 7B sentiment pipeline and Qwen3 32B A/B tests, it recommends neutral schemas, anchored scales, explicit directives, and hard constraints in the consumption layer.
This is one of the more practical prompt-engineering writeups for people shipping smaller local models, because it treats eval quality as a systems problem instead of a wording hack. The big idea is simple: small models are decent classifiers, but shaky mind-readers, so prompt them like analyzers and clean up the rest in code.
- –The D1/D2/D3 framing gives developers a useful vocabulary for why “empathetic assistant” prompts drift positive even when the input is negative
- –Anchoring numeric scales and removing example values from JSON schemas addresses a real failure mode in small-model scoring: hidden distribution bias from the prompt itself
- –The strongest advice is operational, not rhetorical: enforce caps, dedupe overlaps, clamp ranges, and handle malformed output in the consumption layer
- –The warning that state values do not change behavior unless translated into directives is especially relevant for agent builders trying to drive tone from internal memory or emotion state
DISCOVERED
34d ago
2026-03-08
PUBLISHED
34d ago
2026-03-08
RELEVANCE
AUTHOR
Double-Risk-1945