Gemma 4 fixes hit llama.cpp, Google updates templates
Google has released updated Jinja chat templates for the Gemma 4 model family to address critical tool-calling failures. Simultaneously, llama.cpp has merged a fix for the reasoning budget sampler, enabling proper local support for the model's native "thinking" capabilities.
Gemma 4's reasoning capabilities are finally becoming usable in local environments, but the "broken" state of initial GGUFs means manual intervention is still required for most users. New chat templates are mandatory for 31B, 26B, and "E" variants to fix tool-calling transitions, while llama.cpp PR #21697 correctly implements reasoning budget support by populating missing thinking tags. Vision performance can be optimized by manually tuning token limits, and higher temperatures up to 1.5 are reportedly improving one-shot coding performance. Manual template overrides via --chat-template-file remain necessary unless models are re-quantized with the April 9th metadata updates.
DISCOVERED
1d ago
2026-04-10
PUBLISHED
1d ago
2026-04-10
RELEVANCE
AUTHOR
andy2na