BACK_TO_FEEDAICRIER_2
Zhipu AI drops 0.9B GLM-OCR for complex document parsing
OPEN_SOURCE ↗
YT · YOUTUBE// 26d agoMODEL RELEASE

Zhipu AI drops 0.9B GLM-OCR for complex document parsing

Zhipu AI has released GLM-OCR, a lightweight 0.9B parameter multimodal model specifically architected for high-efficiency document parsing. Built on the GLM-V/4V framework, it combines a CogViT visual encoder with a GLM decoder and utilizes multi-token prediction to achieve 50% higher throughput than standard models. The model excels at extracting structured Markdown, JSON, and LaTeX from complex tables, mathematical formulas, and handwriting, even in messy real-world scans with stamps or poor lighting.

// ANALYSIS

Zhipu AI is proving that you don't need 70B parameters to solve complex document understanding, delivering a specialized 0.9B model that punches way above its weight class.

  • Multi-token prediction (MTP) boosts decoding throughput by ~50% over standard autoregressive models
  • Native support for LaTeX and structured JSON makes it a drop-in replacement for expensive proprietary parsing APIs
  • Small enough to run on consumer hardware (0.9B parameters) while maintaining SOTA performance on OmniDocBench
  • Specialized robustness for "real-world" messy scans, including stamps, seals, and rotated text
  • Seamless integration with Ollama and vLLM ensures immediate developer accessibility for local edge deployment
// TAGS
zhipu-aiglm-ocrmultimodalocrllmopen-weightsmarkdowndocument-parsing

DISCOVERED

26d ago

2026-03-17

PUBLISHED

26d ago

2026-03-17

RELEVANCE

8/ 10

AUTHOR

AI Revolution