BACK_TO_FEEDAICRIER_2
LightOnOCR-2 memory spike sparks GLM-OCR caution
OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoINFRASTRUCTURE

LightOnOCR-2 memory spike sparks GLM-OCR caution

A LocalLLaMA user reports OOM on a 16GB M4 MacBook Air with LightOnOCR-2 in Transformers and about 40GB total allocation (11GB VRAM + 30GB RAM) in vLLM after prompting, then asks whether GLM-OCR SDK will behave the same. The post highlights a practical deployment gap between small parameter counts and real multimodal OCR inference memory usage.

// ANALYSIS

Hot take: this looks more like expected multimodal inference behavior than a setup mistake, especially once generation starts and caches explode.

  • Post-prompt spikes are common in OCR VLMs because image tokens, long outputs, and KV cache can dominate memory more than raw model weights.
  • LightOnOCR-2’s own guidance emphasizes rendering constraints and vLLM runtime flags, which suggests serving/config choices strongly affect peak memory.
  • GLM-OCR is marketed as a compact 0.9B model with efficiency-focused decoding, but it is still in the same document-VLM class and can spike on large pages or long outputs.
  • On 16-18GB unified-memory laptops, reliable local runs usually require stricter page batching, lower pixel budgets, and tighter token limits.
// TAGS
lightonocr-2glm-ocrmultimodalinferencegpusdkvllm

DISCOVERED

25d ago

2026-03-17

PUBLISHED

25d ago

2026-03-17

RELEVANCE

7/ 10

AUTHOR

ShOkerpop