BACK_TO_FEEDAICRIER_2
GLM-4.7-Flash hits agentic memory wall in Claude CLI
OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoMODEL RELEASE

GLM-4.7-Flash hits agentic memory wall in Claude CLI

A local LLM user reports that GLM-4.7-Flash Q4_K_M fails to maintain context and execute file operations when used with the Claude CLI, despite performing adequately in direct chat. The issues highlight architectural frictions between MoE models and agentic interfaces in local environments, likely exacerbated by memory constraints and early-stage driver bugs.

// ANALYSIS

Using GLM-4.7-Flash as a local agent is currently a gamble due to MoE-specific bugs and template mismatches in Ollama.

  • The reported "dumbness" points to a failure in XML tool tag parsing or broken chat templates in early Ollama releases.
  • 30B MoE models require careful prompt engineering; the model often gets stuck in "think loops" or loses state during the switch between tool-calling and response generation.
  • 6GB VRAM is insufficient for a 30B model with Q4 quantization; heavy offloading to system RAM likely causes the 10+ minute response times and frequent stalls.
  • Updating Ollama to v0.15.1+ is the standard fix for the "sigmoid" scoring bug that previously crippled this model's logic.
  • For a more reliable agent experience on 32GB RAM, Llama 3.1 8B or Qwen 2.5 7B typically provide superior instruction following for CLI tasks compared to larger, under-resourced MoEs.
// TAGS
glm-4.7-flashollamaagentclaude-codemoelocal-llm

DISCOVERED

6h ago

2026-04-15

PUBLISHED

9h ago

2026-04-15

RELEVANCE

8/ 10

AUTHOR

Agent0o6