Gemma 4 31B loops in llama.cpp

// 63d agoMODEL RELEASE

Gemma 4 31B loops in llama.cpp

Users are reporting infinite looping issues with Google's new Gemma 4 31B model in recent llama.cpp builds. The bug stems from a combination of incorrect newline tokenization and misclassified control tokens that disrupt the model's new reasoning and tool-calling architecture.

// ANALYSIS

Gemma 4's specialized "thinking" mode is exposing deep-seated tokenizer assumptions in local inference engines.

–A critical bug in llama.cpp was splitting double newlines into separate tokens, causing the model to lose coherence in long-form reasoning sessions.
–Specialized tool-call tokens were incorrectly classified as user-defined instead of control tokens, preventing the parser from identifying reasoning boundaries.
–Users on build b8693 should verify their GGUF files; those exported before April 4, 2026, lack essential tokenizer fixes regardless of the runtime version.
–Stability requires the `--jinja` flag to correctly process the model's internal templates and setting `--min-p 0.0` to avoid interference with Gemma 4's sampling logic.
–A new "Unified KV Cache" update has significantly improved VRAM efficiency, but users must update to the latest master branch to benefit from the reduced memory footprint.

// TAGS

gemma-4-31bllama-cppllmreasoningopen-weightsself-hosted

DISCOVERED

63d ago

2026-04-08

PUBLISHED

63d ago

2026-04-07

RELEVANCE

9/ 10

AUTHOR

Express_Quail_1493

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS17m ago

Supabase surveys developers on Claude Fable 5

In a brief weekend engagement post, Supabase asked developers what they are building with "Fable". This refers to Claude Fable 5, the highly capable "Mythos-class" autonomous AI model released by Anthropic on June 9, 2026, which has seen immediate adoption in agentic coding workflows that are often paired with Supabase backend services.

NEWS19m ago

Copilot helps refactor vintage AMD driver

Open-source developer Gert Wollny utilized GitHub Copilot to refactor the shader compiler code for the R600 Gallium3D driver, which supports vintage AMD Radeon HD 2000 to HD 6000 GPUs. By automating tedious refactoring tasks with the AI assistant, Wollny submitted 59 new commits to keep the legacy hardware functional on modern Linux systems.

NEWS20m ago

Dan Shipper Warns of Lovable Infinite Regress

Dan Shipper warns of an "infinite regress" as developers use the AI-powered app builder Lovable to build clones and competitor tools on top of the platform itself. This recursive potential highlights how vibe coding is blurring the boundaries between software creation tools and the applications being created.

Gemma 4 31B loops in llama.cpp