REDDIT · REDDIT// 24d agoBENCHMARK RESULT

Qwen3.5-27B fp8, bf16 benchmark nearly ties

Aider runs on Qwen3.5-27B across bf16 and fp8 weight/KV-cache combinations showed little spread after 10 repeats. The takeaway is practical parity for this agentic-coding workload, not a statistically clean win for one precision setting.

// ANALYSIS

This reads like a strong practical case for fp8 on local coding workloads, at least on this GPU and stack. The bigger story is the low variance: the benchmark is not just saying "fp8 is close," it's saying the gap is small enough to be hard to separate from noise.

–Ten runs per setting still did not produce a meaningful separation, which suggests precision choice may matter less than expected for this workload
–Aider’s 224-task suite and roughly 13.3k tokens per task make this a long-context coding test, not a toy prompt benchmark
–The setup is specific: vLLM in Podman on an RTX 6000 Pro workstation GPU, so the result may not generalize cleanly to other hardware or runtimes
–The most interesting follow-up is longer-context testing, since fp8 KV cache often shows its seams when context gets stretched
–For people choosing between memory savings and tiny quality deltas, fp8 looks like the default worth trying first if their use case resembles agentic coding

// TAGS

qwen3.5-27bbenchmarkai-codingllminferencegpuagent

DISCOVERED

24d ago

2026-03-18

PUBLISHED

24d ago

2026-03-18

RELEVANCE

8/ 10

AUTHOR

Baldur-Norddahl