SOB Benchmarks Value Accuracy, Not Just JSON

// 90d agoBENCHMARK RESULT

SOB Benchmarks Value Accuracy, Not Just JSON

Structured Output Benchmark (SOB) argues that schema-valid JSON is the wrong bar for structured generation, because models can still hallucinate values, misorder arrays, or map dates incorrectly. Its leaderboard adds value accuracy, faithfulness, path recall, structure coverage, and perfect-response scoring across text, image, and audio.

// ANALYSIS

The benchmark is directionally right: if downstream systems depend on exact fields, “valid JSON” is mostly a low-effort pass condition, not proof of useful extraction.

–Value accuracy is the metric that matters for real workflows like invoice parsing, UI automation, and document extraction
–The JSON-pass vs value-accuracy gap is the real headline, because it shows most models are optimized for shape, not correctness
–Multimodal coverage matters here: structured output failures are not just text problems anymore
–The open-source setup makes this more credible as a community reference point, not just a vendor marketing chart
–GLM 4.7 placing near the top suggests open models are closing in on frontier systems for deterministic tasks

// TAGS

structured-output-benchmark-sobbenchmarkstructured-outputmultimodalopen-sourceresearch

DISCOVERED

90d ago

2026-04-29

PUBLISHED

90d ago

2026-04-28

RELEVANCE

9/ 10

AUTHOR

404llm

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE49m ago

xAI develops Grok Imagine Omni multimodal video model

Social media reports indicate that xAI is developing Grok Imagine Omni, an upcoming multimodal extension designed to generate video content from image, video, and audio inputs. While community expectations are high for competing with frontier models, creators note that keyframe reference controls remain essential.

UPDATE1h ago

Velvet Capital adds AI agent for DeFi portfolios

Velvet Capital has introduced an AI agent to complement its platform, focusing on reducing manual workload for crypto investors and traders. The AI copilot streamlines market research and automates complex portfolio execution tasks in decentralized finance (DeFi).

LAUNCH1h ago

Descript launches MCP server for autonomous video editing

Descript has introduced a Model Context Protocol (MCP) server that enables AI agents to interface directly with Descript project files for automated video editing and self-review. Operating within an iterative feedback loop, autonomous agents can perform tasks like media import, text-based editing, filler word removal, and preliminary cuts, handling up to 80% of video editing work before human review.