OPEN_SOURCE ↗
REDDIT · REDDIT// 8d agoMODEL RELEASE
Google releases Gemma 4 under Apache 2.0
Google DeepMind's Gemma 4 family lands with a move to Apache 2.0 and a 31B dense model that pushes reasoning benchmarks. While the architecture enables a 256K context window, early users are reporting massive KV cache memory overhead due to non-standard head dimensions.
// ANALYSIS
Gemma 4 is a strategic pivot for Google, trading memory efficiency for pure reasoning power and an open-source friendly license.
- –The 31B model's "heterogeneous" architecture uses 256 and 512 head dimensions, significantly larger than the 128 standard in models like Qwen or Llama.
- –KV cache quantization (4-bit/8-bit) is no longer optional for consumer hardware; it's a hard requirement to hit the advertised 256K context window.
- –The shift to Apache 2.0 is the real story, likely a move to reclaim market share from Meta's Llama in the developer ecosystem.
- –Early benchmarks put the 31B variant at #3 on the LMSYS Arena, effectively matching GPT-4o performance in a local-first package.
// TAGS
gemma-4googlellmopen-weightsopen-sourcekv-cachereasoning
DISCOVERED
8d ago
2026-04-03
PUBLISHED
8d ago
2026-04-03
RELEVANCE
10/ 10
AUTHOR
IngeniousIdiocy