BACK_TO_FEEDAICRIER_2
SOB Benchmarks Value Accuracy, Not Just JSON
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT

SOB Benchmarks Value Accuracy, Not Just JSON

Structured Output Benchmark (SOB) argues that schema-valid JSON is the wrong bar for structured generation, because models can still hallucinate values, misorder arrays, or map dates incorrectly. Its leaderboard adds value accuracy, faithfulness, path recall, structure coverage, and perfect-response scoring across text, image, and audio.

// ANALYSIS

The benchmark is directionally right: if downstream systems depend on exact fields, “valid JSON” is mostly a low-effort pass condition, not proof of useful extraction.

  • Value accuracy is the metric that matters for real workflows like invoice parsing, UI automation, and document extraction
  • The JSON-pass vs value-accuracy gap is the real headline, because it shows most models are optimized for shape, not correctness
  • Multimodal coverage matters here: structured output failures are not just text problems anymore
  • The open-source setup makes this more credible as a community reference point, not just a vendor marketing chart
  • GLM 4.7 placing near the top suggests open models are closing in on frontier systems for deterministic tasks
// TAGS
structured-output-benchmark-sobbenchmarkstructured-outputmultimodalopen-sourceresearch

DISCOVERED

4h ago

2026-04-29

PUBLISHED

7h ago

2026-04-28

RELEVANCE

9/ 10

AUTHOR

404llm