OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoMODEL RELEASE
DharmaOCR tops OCR benchmarks, cuts costs
Dharma-AI open-sourced DharmaOCR on Hugging Face, including models and datasets, alongside a paper detailing the training and benchmark setup. The 3B and 7B specialized OCR models reportedly beat GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Document AI, and other open-source baselines on structured document extraction.
// ANALYSIS
This is a strong reminder that narrow, schema-driven document tasks can still favor specialized SLMs over giant general-purpose models, especially when cost and failure modes matter as much as raw accuracy.
- –The paper says DPO with the model’s own degenerate outputs as rejected examples cut failure rate by 87.6%, which is a practical fix for OCR loop behavior rather than just a benchmark trick.
- –The reported 0.925 score for the 7B model and 0.911 for the 3B model suggest specialization is doing the heavy lifting, not parameter count alone.
- –AWQ quantization reportedly reduces per-page inference cost by about 22% with negligible quality loss, which matters if you are deploying OCR at scale.
- –The benchmark spans printed, handwritten, and legal/administrative documents in Brazilian Portuguese, so this is more than a generic OCR demo; it is a domain-specific system with a real workload in mind.
- –For teams building document pipelines, the interesting takeaway is not “replace all LLMs,” but “use smaller specialized models where extraction quality, latency, and unit economics are the product.”
// TAGS
dharmaocropen-sourcefine-tuningbenchmarkmultimodalinference
DISCOVERED
6h ago
2026-04-24
PUBLISHED
8h ago
2026-04-24
RELEVANCE
9/ 10
AUTHOR
augusto_camargo3