YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Dual RTX 3060s power $400 Qwen3.6-27B rig

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Dual RTX 3060s power $400 Qwen3.6-27B rig
OPEN LINK ↗
// 2h agoBENCHMARK RESULT

Dual RTX 3060s power $400 Qwen3.6-27B rig

This post benchmarks an ultra-budget dual-RTX 3060 setup running Unsloth’s Qwen3.6-27B GGUF variants in llama.cpp on CUDA. The author reports strong, stable throughput on a dated PCIe 3.0 x8/x8 platform, with MTP pushing generation into the low-40 t/s range and non-MTP mode delivering more context at a still-solid ~30 t/s. The main tradeoff is that tensor parallel mode currently blocks KV-cache quantization, which caps usable context and makes very long prompts awkward.

// ANALYSIS

Hot take: this is the kind of result that makes “budget local LLM rig” feel real rather than theoretical. Two cheap 3060s plus CUDA and llama.cpp outperform the expected value proposition by a lot, especially on stability.

  • Prefill stays healthy even at 12k context, landing around 456 t/s with MTP and still above 600 t/s at initial peak.
  • Generation reaches 43.26 t/s with MTP and about 31 t/s without it, which is a strong tradeoff for local use.
  • The old i7-4770K/Z87 platform is not the bottleneck people would assume, because PCIe 3.0 x8/x8 is competitive with many newer consumer board lane splits.
  • The biggest downside is architectural: `-sm tensor` cannot currently be combined with KV-cache quantization, so 160k-class contexts are out of reach in this configuration.
  • vLLM appears to be the wrong tool for this VRAM-constrained use case here; llama.cpp is the practical winner.
// TAGS
qwen3.6qwenquantizationllama.cppcudartx-3060dual-gpulocal-firstbenchmark

DISCOVERED

2h ago

2026-05-27

PUBLISHED

4h ago

2026-05-26

RELEVANCE

8/ 10

AUTHOR

akira3weet