OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT
Gemma 4 tops closed chats
A LocalLLaMA user reports that Google’s open-weight Gemma 4 31B at Q4 handled a difficult Chinese novel translation prompt better than ChatGPT 5.3, Gemini Chat, Qwen, and GPT OSS 120B. The anecdotal test highlights a familiar local-model advantage: reproducible behavior, less platform filtering, and direct control over the exact model running.
// ANALYSIS
This is not a rigorous benchmark, but it is exactly the kind of workflow-specific eval that matters to developers more than leaderboard averages.
- –Gemma 4 31B is an open-weight dense model from Google DeepMind with multilingual support, a 256K context window, and local deployment paths through Hugging Face and other runtimes.
- –Translation with hidden identities and name consistency stresses long-context tracking, entity resolution, and style control, where small regressions are very visible to users.
- –The complaint about closed-model A/B testing is the real punchline: if a provider silently changes routing or safety behavior, users can lose a working workflow overnight.
- –The result also cuts against the assumption that Gemini Chat must expose Google’s best Gemma-like behavior; consumer chat products are shaped by routing, guardrails, cost, and UX constraints.
- –Treat this as a strong prompt-level data point, not proof that Gemma 4 beats frontier closed models generally.
// TAGS
gemma-4gemma-4-31bllmopen-weightsself-hostedbenchmarkinferencetranslation
DISCOVERED
4h ago
2026-04-22
PUBLISHED
6h ago
2026-04-21
RELEVANCE
8/ 10
AUTHOR
ThisGonBHard