OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT
SOB Benchmarks Value Accuracy, Not Just JSON
Structured Output Benchmark (SOB) argues that schema-valid JSON is the wrong bar for structured generation, because models can still hallucinate values, misorder arrays, or map dates incorrectly. Its leaderboard adds value accuracy, faithfulness, path recall, structure coverage, and perfect-response scoring across text, image, and audio.
// ANALYSIS
The benchmark is directionally right: if downstream systems depend on exact fields, “valid JSON” is mostly a low-effort pass condition, not proof of useful extraction.
- –Value accuracy is the metric that matters for real workflows like invoice parsing, UI automation, and document extraction
- –The JSON-pass vs value-accuracy gap is the real headline, because it shows most models are optimized for shape, not correctness
- –Multimodal coverage matters here: structured output failures are not just text problems anymore
- –The open-source setup makes this more credible as a community reference point, not just a vendor marketing chart
- –GLM 4.7 placing near the top suggests open models are closing in on frontier systems for deterministic tasks
// TAGS
structured-output-benchmark-sobbenchmarkstructured-outputmultimodalopen-sourceresearch
DISCOVERED
4h ago
2026-04-29
PUBLISHED
7h ago
2026-04-28
RELEVANCE
9/ 10
AUTHOR
404llm