JudgeGPT launches local LLM-judge benchmarking with GPU telemetry

// 140d agoOPENSOURCE RELEASE

JudgeGPT launches local LLM-judge benchmarking with GPU telemetry

JudgeGPT is a new open-source benchmarking tool for running local LLM-as-judge evaluations via Ollama, with configurable rubrics, chain-of-thought-backed scoring, and blended human ratings. It combines speed and quality metrics into a configurable leaderboard while exposing GPU telemetry and benchmark history for repeatable testing.

// ANALYSIS

This is a useful step toward more auditable local eval workflows, especially for teams that distrust one-shot judge prompts.

–Behavioral scoring anchors across five criteria help reduce small-model leniency drift compared with naive judge setups.
–Runtime-configurable judge model and system prompt make judge-disagreement studies practical without editing configs.
–Self-family bias warnings and optional human score blending add guardrails against overtrusting automated scores.
–Real-time Metal/ROCm/CUDA telemetry plus SQLite history make performance-quality tradeoff tracking much easier over time.

// TAGS

judgegptllmbenchmarkopen-sourcedevtoolgpumlops

DISCOVERED

140d ago

2026-03-14

PUBLISHED

140d ago

2026-03-13

RELEVANCE

9/ 10

AUTHOR

1T_Geek

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

TUTORIAL12m ago

Dual Blackwell GPUs run 167 GB DeepSeek-V4 FP8

A developer shared a deployment recipe for running the official FP8 version of DeepSeek-V4-Flash-0731 alongside DSpark speculative decoding on a dual NVIDIA RTX PRO 6000 Blackwell (SM120) GPU rig. Requiring approximately 167 GB of VRAM, the model fits cleanly across the system's combined 192 GB VRAM capacity (2× 96 GB) without offloading or truncation.

UPDATE1h ago

Genspark Workspace 6.0 drops six major updates

Genspark Workspace 6.0 expands Genspark's ecosystem across six core updates designed to bridge ambient work context into executable workflows. Key releases include SecondBrain Note hardware voice recorder, GenTeam multi-agent collaboration, GenMail email workflows, Genspark Design, AI Slides, and AgentBase for custom databases.

NEWS1h ago

Google begins active development on Gemini 4

Google is reportedly actively developing Gemini 4, its next-generation foundation model designed to be its most advanced AI system to date. Key objectives for the new model include superior reasoning skills, improved coding assistance, and enhanced agentic capabilities for autonomous task execution, while Gemini 3.5 Pro continues testing behind the scenes.