DharmaOCR tops OCR benchmarks, cuts costs
Dharma-AI open-sourced DharmaOCR on Hugging Face, including models and datasets, alongside a paper detailing the training and benchmark setup. The 3B and 7B specialized OCR models reportedly beat GPT-5.4, Gemini 3.1 Pro, Claude Opus 4.6, Google Document AI, and other open-source baselines on structured document extraction.
This is a strong reminder that narrow, schema-driven document tasks can still favor specialized SLMs over giant general-purpose models, especially when cost and failure modes matter as much as raw accuracy.
- –The paper says DPO with the model’s own degenerate outputs as rejected examples cut failure rate by 87.6%, which is a practical fix for OCR loop behavior rather than just a benchmark trick.
- –The reported 0.925 score for the 7B model and 0.911 for the 3B model suggest specialization is doing the heavy lifting, not parameter count alone.
- –AWQ quantization reportedly reduces per-page inference cost by about 22% with negligible quality loss, which matters if you are deploying OCR at scale.
- –The benchmark spans printed, handwritten, and legal/administrative documents in Brazilian Portuguese, so this is more than a generic OCR demo; it is a domain-specific system with a real workload in mind.
- –For teams building document pipelines, the interesting takeaway is not “replace all LLMs,” but “use smaller specialized models where extraction quality, latency, and unit economics are the product.”
DISCOVERED
45d ago
2026-04-24
PUBLISHED
45d ago
2026-04-24
RELEVANCE
AUTHOR
augusto_camargo3