Llama.cpp samplers fail on Gemma 4 architecture

// 90d agoNEWS

Llama.cpp samplers fail on Gemma 4 architecture

ANNOUNCEMENT PRODUCT GITHUB PRODUCT HUNT

Users report that llama.cpp samplers are essentially ignored by the new Gemma 4 models, leading to repetitive, deterministic outputs even at extreme temperatures. The regression is linked to missing logit soft-capping support and recent architectural changes in backend sampling.

// ANALYSIS

The "no variance" bug in Gemma 4 isn't just a settings issue; it's an architectural mismatch between legacy samplers and new logit dynamics.

–Gemma 4's logit soft-capping requires specialized handling that was only recently stabilized in PR #21390.
–The migration of sampling logic directly into the CUDA computation graph has introduced state synchronization errors for floating-point parameters.
–Users are seeing coherent output at temperature 1000 because the sampler is defaulting to greedy decoding when it encounters invalid logit states.
–Re-downloading GGUFs is mandatory for many as early April 2026 quants lacked the necessary `add_bos` and metadata for Gemma 4's specific reasoning budget.
–The community is pivoting toward "Min P" and the new "Reasoning Budget Sampler" as the only reliable way to maintain quality on these newer architectures.

// TAGS

llama-cppgemma-4inferencesamplingopen-sourcellm

DISCOVERED

90d ago

2026-04-19

PUBLISHED

90d ago

2026-04-19

RELEVANCE

8/ 10

AUTHOR

kaisurniwurer

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL39m ago

Kimi K3 launch strengthens open-source case

The release of Moonshot AI's Kimi K3, an open-weights model with 2.8 trillion parameters, a 1-million-token context window, and native visual processing, has sparked discussion about the viability of proprietary frontier LLM training. As open-weights models achieve performance parity with proprietary systems on key coding and agentic benchmarks, developers and investors are increasingly questioning the massive capital requirements of closed-source frontier projects in favor of more cost-effective open alternatives.

MODEL1h ago

Moonshot AI launches Kimi K3

Moonshot AI has launched Kimi K3, a natively multimodal 2.8-trillion-parameter model with a 1-million-token context window. Built on a novel attention architecture, the model is optimized for long-horizon coding and multi-step reasoning tasks.

MODEL3h ago

NVIDIA launches Ardy real-time motion model

NVIDIA's Spatial Intelligence Lab has developed Ardy, an autoregressive diffusion model for real-time, interactive 3D human motion generation. The model supports online text prompting and flexible kinematic constraints at inference time without requiring retraining, making it suitable for animation, gaming, and robotics.