Zhipu AI drops 0.9B GLM-OCR for complex document parsing
Zhipu AI has released GLM-OCR, a lightweight 0.9B parameter multimodal model specifically architected for high-efficiency document parsing. Built on the GLM-V/4V framework, it combines a CogViT visual encoder with a GLM decoder and utilizes multi-token prediction to achieve 50% higher throughput than standard models. The model excels at extracting structured Markdown, JSON, and LaTeX from complex tables, mathematical formulas, and handwriting, even in messy real-world scans with stamps or poor lighting.
Zhipu AI is proving that you don't need 70B parameters to solve complex document understanding, delivering a specialized 0.9B model that punches way above its weight class.
- –Multi-token prediction (MTP) boosts decoding throughput by ~50% over standard autoregressive models
- –Native support for LaTeX and structured JSON makes it a drop-in replacement for expensive proprietary parsing APIs
- –Small enough to run on consumer hardware (0.9B parameters) while maintaining SOTA performance on OmniDocBench
- –Specialized robustness for "real-world" messy scans, including stamps, seals, and rotated text
- –Seamless integration with Ollama and vLLM ensures immediate developer accessibility for local edge deployment
DISCOVERED
26d ago
2026-03-17
PUBLISHED
26d ago
2026-03-17
RELEVANCE
AUTHOR
AI Revolution