BACK_TO_FEEDAICRIER_2
JudgeGPT launches local LLM-judge benchmarking with GPU telemetry
OPEN_SOURCE ↗
REDDIT · REDDIT// 29d agoOPENSOURCE RELEASE

JudgeGPT launches local LLM-judge benchmarking with GPU telemetry

JudgeGPT is a new open-source benchmarking tool for running local LLM-as-judge evaluations via Ollama, with configurable rubrics, chain-of-thought-backed scoring, and blended human ratings. It combines speed and quality metrics into a configurable leaderboard while exposing GPU telemetry and benchmark history for repeatable testing.

// ANALYSIS

This is a useful step toward more auditable local eval workflows, especially for teams that distrust one-shot judge prompts.

  • Behavioral scoring anchors across five criteria help reduce small-model leniency drift compared with naive judge setups.
  • Runtime-configurable judge model and system prompt make judge-disagreement studies practical without editing configs.
  • Self-family bias warnings and optional human score blending add guardrails against overtrusting automated scores.
  • Real-time Metal/ROCm/CUDA telemetry plus SQLite history make performance-quality tradeoff tracking much easier over time.
// TAGS
judgegptllmbenchmarkopen-sourcedevtoolgpumlops

DISCOVERED

29d ago

2026-03-14

PUBLISHED

29d ago

2026-03-13

RELEVANCE

9/ 10

AUTHOR

1T_Geek