BACK_TO_FEEDAICRIER_2
Llama.cpp samplers fail on Gemma 4 architecture
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoNEWS

Llama.cpp samplers fail on Gemma 4 architecture

Users report that llama.cpp samplers are essentially ignored by the new Gemma 4 models, leading to repetitive, deterministic outputs even at extreme temperatures. The regression is linked to missing logit soft-capping support and recent architectural changes in backend sampling.

// ANALYSIS

The "no variance" bug in Gemma 4 isn't just a settings issue; it's an architectural mismatch between legacy samplers and new logit dynamics.

  • Gemma 4's logit soft-capping requires specialized handling that was only recently stabilized in PR #21390.
  • The migration of sampling logic directly into the CUDA computation graph has introduced state synchronization errors for floating-point parameters.
  • Users are seeing coherent output at temperature 1000 because the sampler is defaulting to greedy decoding when it encounters invalid logit states.
  • Re-downloading GGUFs is mandatory for many as early April 2026 quants lacked the necessary `add_bos` and metadata for Gemma 4's specific reasoning budget.
  • The community is pivoting toward "Min P" and the new "Reasoning Budget Sampler" as the only reliable way to maintain quality on these newer architectures.
// TAGS
llama-cppgemma-4inferencesamplingopen-sourcellm

DISCOVERED

4h ago

2026-04-19

PUBLISHED

6h ago

2026-04-19

RELEVANCE

8/ 10

AUTHOR

kaisurniwurer