Dual RTX 3060s power $400 Qwen3.6-27B rig

// 45d agoBENCHMARK RESULT

Dual RTX 3060s power $400 Qwen3.6-27B rig

This post benchmarks an ultra-budget dual-RTX 3060 setup running Unsloth’s Qwen3.6-27B GGUF variants in llama.cpp on CUDA. The author reports strong, stable throughput on a dated PCIe 3.0 x8/x8 platform, with MTP pushing generation into the low-40 t/s range and non-MTP mode delivering more context at a still-solid ~30 t/s. The main tradeoff is that tensor parallel mode currently blocks KV-cache quantization, which caps usable context and makes very long prompts awkward.

// ANALYSIS

Hot take: this is the kind of result that makes “budget local LLM rig” feel real rather than theoretical. Two cheap 3060s plus CUDA and llama.cpp outperform the expected value proposition by a lot, especially on stability.

–Prefill stays healthy even at 12k context, landing around 456 t/s with MTP and still above 600 t/s at initial peak.
–Generation reaches 43.26 t/s with MTP and about 31 t/s without it, which is a strong tradeoff for local use.
–The old i7-4770K/Z87 platform is not the bottleneck people would assume, because PCIe 3.0 x8/x8 is competitive with many newer consumer board lane splits.
–The biggest downside is architectural: `-sm tensor` cannot currently be combined with KV-cache quantization, so 160k-class contexts are out of reach in this configuration.
–vLLM appears to be the wrong tool for this VRAM-constrained use case here; llama.cpp is the practical winner.

// TAGS

qwen3.6qwenquantizationllama.cppcudartx-3060dual-gpulocal-firstbenchmarkllm

DISCOVERED

45d ago

2026-05-27

PUBLISHED

45d ago

2026-05-26

RELEVANCE

8/ 10

AUTHOR

akira3weet

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS1h ago

GPT-5.6 Sol Shares Fable 5 Vulnerabilities

OpenAI's latest flagship model, GPT-5.6 Sol, reportedly faces security concerns resembling those that led the Trump administration to impose temporary export controls on Anthropic's Fable 5 model. Amidst growing government scrutiny of frontier models and their ability to assist in cyber exploits, both companies are coordinating closely with federal bodies to mitigate national security risks, marking a major shift in how advanced AI releases are regulated.

LAUNCH1h ago

NVIDIA, LangChain launch secure NemoClaw blueprint

NVIDIA and LangChain have collaborated to release the "NemoClaw for LangChain Deep Agents" blueprint, an open-source reference stack designed to build, evaluate, and run autonomous enterprise AI agents safely. The stack combines NVIDIA's Nemotron 3 Ultra, LangChain's Deep Agents harness, and NVIDIA's OpenShell runtime to provide secure, sandboxed execution with kernel-level isolation, default-deny networking, and full infrastructure control.

UPDATE2h ago

rabbitOS 2.3 integrates Nous Hermes Agent

In the latest rabbitOS 2.3 OTA update, Rabbit Inc. has added native integration for Nous Research's autonomous Hermes Agent on the Rabbit R1. Users link their local Hermes Agent terminal via the Rabbithole web portal and swipe left on the R1 home screen to interact with the agent.