BACK_TO_FEEDAICRIER_2
oMLX DFlash update shows mixed Qwen3 results
OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoBENCHMARK RESULT

oMLX DFlash update shows mixed Qwen3 results

Performance tests of DFlash block-diffusion speculative decoding in oMLX v0.3.5-rc1 show inconsistent results on M2 Max hardware. While Qwen3-Coder-30B-A3B achieved a 21% speedup, the smaller Qwen3.5-9B model saw a 44% slowdown due to draft model overhead.

// ANALYSIS

DFlash's block-diffusion approach is a niche optimization requiring precise model-draft alignment to be effective. Code generation remains the primary use case where block-based predictions justify the overhead, whereas smaller models lack the computational headroom to benefit from the complex verification step. Additionally, compatibility issues with DeltaNet-based architectures currently lead to system crashes.

// TAGS
omlxdflashmlxllmspeculative-decodingqwen3apple-siliconbenchmarks

DISCOVERED

6h ago

2026-04-15

PUBLISHED

6h ago

2026-04-15

RELEVANCE

7/ 10

AUTHOR

CrushingLoss