OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoOPENSOURCE RELEASE
JudgeGPT launches local LLM-judge benchmarking with GPU telemetry
JudgeGPT is a new open-source benchmarking tool for running local LLM-as-judge evaluations via Ollama, with configurable rubrics, chain-of-thought-backed scoring, and blended human ratings. It combines speed and quality metrics into a configurable leaderboard while exposing GPU telemetry and benchmark history for repeatable testing.
// ANALYSIS
This is a useful step toward more auditable local eval workflows, especially for teams that distrust one-shot judge prompts.
- –Behavioral scoring anchors across five criteria help reduce small-model leniency drift compared with naive judge setups.
- –Runtime-configurable judge model and system prompt make judge-disagreement studies practical without editing configs.
- –Self-family bias warnings and optional human score blending add guardrails against overtrusting automated scores.
- –Real-time Metal/ROCm/CUDA telemetry plus SQLite history make performance-quality tradeoff tracking much easier over time.
// TAGS
judgegptllmbenchmarkopen-sourcedevtoolgpumlops
DISCOVERED
29d ago
2026-03-14
PUBLISHED
29d ago
2026-03-13
RELEVANCE
9/ 10
AUTHOR
1T_Geek