IDP Leaderboard pits 16 document VLMs

// 77d agoBENCHMARK RESULT

IDP Leaderboard pits 16 document VLMs

Nanonets has launched the IDP Leaderboard, an open benchmark and results explorer for document AI covering 16 models, 9,000+ real documents, and three suites: OlmOCR, OmniDocBench, and IDP Core. Gemini 3.1 Pro leads overall at 83.2, but the tighter story is how small the top-tier gap is once you look past reasoning-heavy VQA tasks.

// ANALYSIS

This is more useful than yet another one-number model ranking because it exposes raw predictions, failure modes, and cost/performance tradeoffs on real document workloads. For AI teams building OCR, KIE, or table pipelines, that kind of transparency matters more than a glossy benchmark win.

–The standout product feature is the Results Explorer, which shows model outputs beside ground truth instead of hiding behind aggregate scores
–Gemini 3.1 Pro leads overall, but cheaper variants like Flash and Sonnet stay surprisingly close on extraction-heavy tasks, suggesting reasoning is where premium models still justify their cost
–GPT-5.4’s jump over GPT-4.1 is significant, especially on DocVQA and table extraction, making document understanding one of the clearer areas of recent model progress
–Sparse unstructured tables and handwriting OCR remain stubbornly hard, which is exactly the kind of reality check production teams need before trusting vendor accuracy claims
–The benchmark is open, reproducible, and linked to public datasets and code, which gives it more credibility than closed vendor bakeoffs

// TAGS

idp-leaderboardbenchmarkresearchmultimodaldata-toolsopen-source

DISCOVERED

77d ago

2026-03-11

PUBLISHED

77d ago

2026-03-11

RELEVANCE

8/ 10

AUTHOR

shhdwi

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE46m ago

Plannotator 0.19.24 adds Amp support and configurable storage

Plannotator 0.19.24 is a substantial release that expands the tool beyond Claude Code with native Amp support, adds a `PLANNOTATOR_DATA_DIR` override so users can move the default `~/.plannotator` data directory, introduces Auto Mode in the permission selector for newer Claude Code versions, and fixes a Pi approval crash after plan acceptance. The update folds multiple stacked PRs into one release and pushes the project further toward a multi-agent review layer rather than a single-agent hook utility.

UPDATE1h ago

Grok Build widens access, adds subagents

xAI’s Grok Build is an early-beta terminal coding agent with plan-review-approve flows, parallel subagents, worktree isolation, and support for plugins, hooks, skills, and MCP. The latest improvements make it feel less like a demo and more like xAI’s bid to compete seriously in the AI coding CLI race.

MODEL1h ago

Krea 2 lands on Replicate

Krea 2 is now available on Replicate, giving developers access to Krea's style-first image model outside the Krea app. It emphasizes aesthetic diversity, style control, and reference-driven creative workflows.