YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Lucebox DFlash hits 207 tok/s on RTX 3090

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Lucebox DFlash hits 207 tok/s on RTX 3090
OPEN LINK ↗
// 45d agoBENCHMARK RESULT

Lucebox DFlash hits 207 tok/s on RTX 3090

Lucebox Hub is an open-source LLM inference optimization repo focused on hand-tuned performance for specific hardware. The highlighted benchmark is its DFlash DDtree port for Qwen3.5-27B GGUF on an RTX 3090, where the project reports a demo peak of 207.6 tok/s and 129.5 tok/s on its HumanEval bench. The repo also includes a separate Megakernel release for Qwen3.5-0.8B, with writeups, benchmark tables, and reproducible build instructions.

// ANALYSIS

This is less a product launch than a performance flex with real engineering substance. The interesting part is not just the raw tok/s number, but that they squeezed speculative decoding, tree verification, and a GGUF target into 24 GB on consumer hardware.

  • The 207 tok/s claim is tied to DFlash + DDTree on Qwen3.5-27B, not plain autoregressive decoding.
  • The repo is unusually transparent: it includes benchmark tables, hardware constraints, and implementation notes.
  • The project’s value prop is clear for local AI users: more throughput on existing RTX 3090-class cards without changing hardware.
  • The strongest audience fit is developers who care about inference kernels, quantization, and local model serving performance.
// TAGS
llm inferencespeculative decodingqwen3.5rtx 3090cudaopen sourcebenchmark

DISCOVERED

45d ago

2026-04-21

PUBLISHED

45d ago

2026-04-20

RELEVANCE

9/ 10

AUTHOR

GreenGames