Qwen3.6-27B tests 3-GPU speed ceiling

// 45d agoBENCHMARK RESULT

Qwen3.6-27B tests 3-GPU speed ceiling

A Reddit user reports 18-20 t/s generation and about 650 t/s prompt processing on Q8 quants across three Radeon 7900 XTX GPUs in llama.cpp. The post asks whether those numbers are normal and what tuning tricks actually move the needle in multi-GPU AMD setups.

// ANALYSIS

The numbers do not look wildly off for a dense 27B model on consumer AMD hardware, but they do suggest the decode path is the bottleneck, not raw VRAM.

–27B Q8 across 3x 7900 XTX is already a high-friction inference setup, so scaling gains will come from tuning more than from simply adding cards
–650 t/s prompt processing is decent; the gap is that decode speed often flattens out because of split overhead, synchronization, and KV-cache behavior
–This is a useful real-world datapoint because AMD multi-GPU llama.cpp performance is discussed far less often than CUDA/NVIDIA setups
–The most relevant knobs are likely tensor split strategy, batch sizes, context settings, and build/driver versions rather than the model alone

// TAGS

qwen3llama-cppai-codinginferencegpuself-hostedopen-weightsbenchmark

DISCOVERED

45d ago

2026-04-25

PUBLISHED

45d ago

2026-04-24

RELEVANCE

8/ 10

AUTHOR

SemaMod

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE19m ago

Cloudflare Wrangler CLI guides agent setup

Cloudflare has integrated an AI agent onboarding step into the Wrangler CLI login flow, guiding developers to set up Cloudflare Skills and Model Context Protocol (MCP) servers. Once registered, these MCP servers enable coding agents to manage API bindings, track builds, and access documentation across major IDE platforms.

LAUNCH22m ago

Moonshot AI launches Kimi Work desktop agent

Moonshot AI has introduced Kimi Work, a desktop AI agent workspace powered by the Kimi K2.6 model that orchestrates up to 300 parallel agent swarms for complex productivity tasks. Operating locally, the application integrates with user files and features WebBridge technology for browser automation and local script scheduling.

UPDATE26m ago

GPT-5-nano, DeepSeek top AINFT leaderboard

AINFT has introduced a real-world usage leaderboard that tracks the popularity of various AI models across its decentralized platform based on direct user interactions rather than standard synthetic benchmarks. The initial data shows a strong preference for highly efficient and cost-effective models, with OpenAI's GPT-5-nano holding the top rank and DeepSeek securing three of the top five positions (V3.2, V4-Flash, and V4-Pro).

Qwen3.6-27B tests 3-GPU speed ceiling