OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoBENCHMARK RESULT
fdash triples vLLM inference speed on Qwen 3.5
A developer benchmarked vLLM's speculative decoding methods on Qwen3.5-27B, finding the new fdash proposer nearly triples generation speed to 125 tokens per second. However, fdash currently lacks compatibility with 8-bit KV cache compression, demanding significantly more VRAM than native MTP alternatives.
// ANALYSIS
The speed gains from fdash are staggering, but its heavy memory tax keeps it out of reach for smaller GPU setups.
- –fdash achieved 124.96 TPS compared to the baseline 46.57 TPS without speculation, proving it as a top-tier decoding method for local inference
- –Qwen 3.5's native Multi-Token Prediction (MTP) is slower (84.57 TPS) but supports FP8 KV caching, making it the practical choice for VRAM-constrained environments
- –The lack of FP8 KV cache support for fdash forces users to choose between maximum throughput and memory efficiency until vLLM expands compatibility
// TAGS
vllmqwen-3.5llminferencegpubenchmark
DISCOVERED
7h ago
2026-04-12
PUBLISHED
11h ago
2026-04-12
RELEVANCE
8/ 10
AUTHOR
Sticking_to_Decaf