BACK_TO_FEEDAICRIER_2
Lucebox DFlash hits 207 tok/s on RTX 3090
OPEN_SOURCE ↗
HN · HACKER_NEWS// 4h agoBENCHMARK RESULT

Lucebox DFlash hits 207 tok/s on RTX 3090

Lucebox Hub is an open-source LLM inference optimization repo focused on hand-tuned performance for specific hardware. The highlighted benchmark is its DFlash DDtree port for Qwen3.5-27B GGUF on an RTX 3090, where the project reports a demo peak of 207.6 tok/s and 129.5 tok/s on its HumanEval bench. The repo also includes a separate Megakernel release for Qwen3.5-0.8B, with writeups, benchmark tables, and reproducible build instructions.

// ANALYSIS

This is less a product launch than a performance flex with real engineering substance. The interesting part is not just the raw tok/s number, but that they squeezed speculative decoding, tree verification, and a GGUF target into 24 GB on consumer hardware.

  • The 207 tok/s claim is tied to DFlash + DDTree on Qwen3.5-27B, not plain autoregressive decoding.
  • The repo is unusually transparent: it includes benchmark tables, hardware constraints, and implementation notes.
  • The project’s value prop is clear for local AI users: more throughput on existing RTX 3090-class cards without changing hardware.
  • The strongest audience fit is developers who care about inference kernels, quantization, and local model serving performance.
// TAGS
llm inferencespeculative decodingqwen3.5rtx 3090cudaopen sourcebenchmark

DISCOVERED

4h ago

2026-04-21

PUBLISHED

17h ago

2026-04-20

RELEVANCE

9/ 10

AUTHOR

GreenGames