Qwen 3.5, Gemma 4 benchmarks hit 79 tok/s

// 91d agoBENCHMARK RESULT

Qwen 3.5, Gemma 4 benchmarks hit 79 tok/s

A comparative benchmark of Qwen 3.5 and Gemma 4 on dual RTX 4070/3060 setups reveals massive performance gains for consumer hardware. Sparse MoE models like Qwen 3.5 reach near-instant generation speeds, while Gemma 4 prioritizes reasoning depth at the cost of context capacity.

// ANALYSIS

Qwen 3.5 35B-A3B hits 79 tokens/s on consumer hardware, outperforming previous dense architectures by 40% in sustained throughput. A secondary RTX 3060 yields a 1.5x prompt processing boost, while Gemma 4 31B offers superior reasoning but hits VRAM limits at 15k context compared to Qwen's 50k+. Inference engines like LM Studio now provide day-0 optimizations for these hybrid architectures, though uneven VRAM utilization remains a minor friction point.

// TAGS

gpullmbenchmarkqwen-3-5gemma-4open-sourcelm-studiolocal-llm-benchmarking

DISCOVERED

91d ago

2026-04-14

PUBLISHED

91d ago

2026-04-13

RELEVANCE

8/ 10

AUTHOR

DracoTorpedo

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL27m ago

GPT-5.6 retains reasoning context across turns

A key architectural detail has been revealed for OpenAI's new GPT-5.6 model family: unlike predecessor models that discarded Chain of Thought (CoT) context at each turn to save context window space, GPT-5.6 maintains its reasoning context across the entire conversation history. This change ensures that the model preserves its logical chain and intermediate reasoning steps throughout multi-turn interactions.

OPEN SOURCE3h ago

scroll-world launches scroll-driven 3D flight skill

scroll-world is an open-source, framework-agnostic agent skill that leverages Higgsfield to generate immersive, scroll-driven 3D camera flights through diorama scenes for landing pages. By rendering seamless connection clips between neighboring frames, it allows developers to build interactive 3D narrative websites navigated simply by scrolling, without requiring heavy game engines.

MODEL4h ago

OpenAI GPT-5.6 hits Amazon Bedrock

OpenAI's GPT-5.6 model family—including Sol, Terra, and Luna—is now generally available on Amazon Bedrock. Running on Bedrock's next-generation inference engine, the models support prompt caching with a 90% discount and match OpenAI's first-party pricing.