Qwen3.6-27B benchmarks on dual V100s

// 2h agoBENCHMARK RESULT

Qwen3.6-27B benchmarks on dual V100s

The benchmark looks broadly sane: Qwen3.6-27B is running across two V100 32GB cards in llama.cpp tensor-parallel mode with flash attention and an unquantized KV cache. The big story is not a misconfiguration, but the expected throughput drop as prefill depth climbs into long-context territory.

// ANALYSIS

This is a credible dual-V100 setup, and the 64K pp slowdown looks like the normal cost of deeper KV-cache prefill rather than a red flag. The main question is less “is it broken?” and more “are V100s the right tradeoff if your workload is mostly text and long context?”

–`-sm tensor` plus `--flash-attn 1` is the right llama.cpp path for multi-GPU tensor split; llama.cpp also expects non-quantized KV cache in this mode.
–`-d` sets context depth for the test, so each run is intentionally stressing a larger KV cache and more memory traffic.
–Qwen3.6-27B is a fitting stress test here: it is a 27B dense model with a native 262K context window and a strong coding-agent bias.
–The value of 2x V100 is VRAM headroom and context comfort, not raw speed; if latency is the priority, a 3090-class card will usually be faster.
–The thread’s note about `64` CPU threads is worth revisiting, because that is probably more than one request needs in practice.

// TAGS

qwen3-6-27bllmbenchmarklong-contextinferencegpuquantizationcoding-agent

DISCOVERED

2h ago

2026-05-10

PUBLISHED

4h ago

2026-05-10

RELEVANCE

8/ 10

AUTHOR

starkruzr

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE46m ago

Codex lands signed-in Chrome browser control

OpenAI has added a Chrome extension for Codex that lets the agent operate inside a real signed-in browser session, which makes authenticated workflows like Gmail, Salesforce, LinkedIn, and internal admin tools practical instead of fragile. The update also introduces host-based approval flows, allowlist and blocklist controls, and additional browser-safety guardrails so teams can decide when Codex may touch a site and when it must ask first.

UPDATE46m ago

Codex CLI gains goals, Vim editing

The April 30 and May 7 Codex CLI releases expand the terminal agent with persisted goals, `codex update`, Vim composer editing, richer status-line controls, and more structured plugin and hook workflows. Together, they push Codex further toward a durable, team-friendly command-line environment for long-running coding work.

TUTORIAL3h ago

Anthropic's Applied AI team drops Claude prompting workshop

Anthropic's Applied AI team released a free 24-minute workshop on prompting Claude more effectively. It focuses on practical prompt structure, context, and workflow habits.