OPEN_SOURCE ↗
REDDIT · REDDIT// 18d agoOPENSOURCE RELEASE
Contradish flags inconsistent LLM answers
Contradish is a public, MIT-licensed Python library that stress-tests LLM apps by paraphrasing prompts, rerunning the same app across variants, and surfacing consistency scores and contradiction reports. It works with Anthropic and OpenAI, aiming to catch reliability bugs before users do.
// ANALYSIS
This is a small but genuinely useful category, closer to a unit-test harness for response stability than a vanity benchmark.
- –Semantic variants are a better match for real user drift than exact-match tests, especially when wording changes but intent stays the same
- –CI thresholds turn consistency into a release gate for prompt edits, model swaps, or policy updates
- –The benchmark framing gives teams a shared metric, and the Python API and CLI lower the friction, so it can actually live inside existing eval workflows
- –The strongest fit is support, policy, and agent workflows where contradictory answers are a trust and liability problem
- –It measures consistency, not truth, so it should complement retrieval and grounding checks rather than replace them
// TAGS
contradishllmtestingdevtoolopen-source
DISCOVERED
18d ago
2026-03-24
PUBLISHED
18d ago
2026-03-24
RELEVANCE
8/ 10
AUTHOR
Silent_Kitchen5203