llama.cpp users swap launch flags for Gemma 4

// 50d agoTUTORIAL

llama.cpp users swap launch flags for Gemma 4

This Reddit post is a practical troubleshooting thread about running Gemma 4 models through `llama.cpp`, especially the `llama-server` path for multimodal use. The author says recent builds now load the models, but generation is still either poor quality or unexpectedly slow, with one RTX 6000 Pro setup only reaching about 3 tokens per second. They’re specifically looking for known-good startup flags for Heretic Gemma 4 GGUFs, including image analysis support, and want examples from others who have already gotten usable performance.

// ANALYSIS

This reads like a tuning and compatibility question, not a product launch, and the bottleneck is probably in the runtime configuration rather than the hardware alone.

–The post centers on `llama-server` flags for Gemma 4, with `--jinja`, `--mmproj`, very large context, and wide image token bounds all in play.
–The author is seeing both degraded output and low throughput, which suggests a mix of template, multimodal, quantization, or offload issues.
–Because the ask is for “working init strings,” the thread is likely to surface community-tested launch recipes rather than a single canonical fix.
–The workload is explicitly multimodal image analysis, so performance expectations are higher than for text-only inference.

// TAGS

llama-cppllama-servergemma-4multimodalggufinferencecudaperformance

DISCOVERED

50d ago

2026-04-08

PUBLISHED

50d ago

2026-04-08

RELEVANCE

8/ 10

AUTHOR

AlwaysLateToThaParty

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS18m ago

ElevenLabs, Greece partner on voice AI gov services

ElevenLabs signed a Memorandum of Understanding with the Greek government to integrate voice AI into the gov.gr portal, automate public service call centers, and preserve regional dialects like Cretan. The initiative aims to modernize bureaucracy and tourism through natural language interaction and linguistic heritage preservation.

VIDEO1h ago

Mistral Vibe wires connectors into CLI workflows

Mistral Vibe’s connector layer lets the terminal agent reach into external services from one workflow. The demo shows it reading requirements, editing code, opening a GitHub PR, and updating Linear without leaving the CLI.

NEWS3h ago

Dev lets Claude trade BTC overnight, nets $95 profit

A developer gave Claude a $20 budget to autonomously script and execute Bitcoin trades overnight, waking up to a functional trading bot and a $95 profit across five trades.