Gemma 4 fails multi-turn tool calling
Local LLM users report that Gemma 4's native function calling abruptly terminates generation during complex, multi-turn tool sequences. While initial tool calls succeed, the model fails to continue after responding to the user when subsequent tools are required.
Gemma 4's native function calling is a welcome addition, but its brittle execution in local environments highlights the gap between hosted APIs and open-weight models.
- –The bug triggers specifically during extended agentic loops (think -> tool -> respond -> think -> tool), causing an immediate generation halt.
- –This severely limits the model's viability for autonomous workflows that require continuous, multi-step reasoning and interaction.
- –The failure likely stems from how local inference engines or quantization formats parse the model's new thinking and tool-use tokens.
- –Fixes will likely require client-side updates in inference frameworks like llama.cpp or Ollama to properly handle the token streams.
DISCOVERED
45d ago
2026-04-15
PUBLISHED
45d ago
2026-04-15
RELEVANCE
AUTHOR
dampflokfreund