Lucebox DFlash hits 207 tok/s on RTX 3090

// 90d agoBENCHMARK RESULT

Lucebox DFlash hits 207 tok/s on RTX 3090

Lucebox Hub is an open-source LLM inference optimization repo focused on hand-tuned performance for specific hardware. The highlighted benchmark is its DFlash DDtree port for Qwen3.5-27B GGUF on an RTX 3090, where the project reports a demo peak of 207.6 tok/s and 129.5 tok/s on its HumanEval bench. The repo also includes a separate Megakernel release for Qwen3.5-0.8B, with writeups, benchmark tables, and reproducible build instructions.

// ANALYSIS

This is less a product launch than a performance flex with real engineering substance. The interesting part is not just the raw tok/s number, but that they squeezed speculative decoding, tree verification, and a GGUF target into 24 GB on consumer hardware.

–The 207 tok/s claim is tied to DFlash + DDTree on Qwen3.5-27B, not plain autoregressive decoding.
–The repo is unusually transparent: it includes benchmark tables, hardware constraints, and implementation notes.
–The project’s value prop is clear for local AI users: more throughput on existing RTX 3090-class cards without changing hardware.
–The strongest audience fit is developers who care about inference kernels, quantization, and local model serving performance.

// TAGS

llm inferencespeculative decodingqwen3.5rtx 3090cudaopen sourcebenchmark

DISCOVERED

90d ago

2026-04-21

PUBLISHED

91d ago

2026-04-20

RELEVANCE

9/ 10

AUTHOR

GreenGames

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

LAUNCH25m ago

Ramp launches Ramp Router

Ramp has launched Ramp Router, an LLM routing engine designed to optimize AI inference costs and performance. Built internally over three years to power Ramp's own products, the service is now open to external organizations.

NEWS39m ago

Chipmaker stocks rebound after Kimi K3 selloff

Shares of prominent semiconductor companies, including Micron Technology (MU), Marvell Technology (MRVL), Intel (INTC), and Advanced Micro Devices (AMD), are recovering value after a recent tech selloff. The market drop, which occurred on Friday, was precipitated by the launch of a new artificial intelligence model by the Chinese startup Moonshot AI, raising competitive and market concerns before stock values began to stabilize.

OPEN SOURCE1h ago

AAIF hosts Model Context Protocol release parties

The Agentic AI Foundation will host global in-person release parties on July 28, 2026, to celebrate the launch of the new Model Context Protocol (MCP) 2026-07-28 specification. The milestone release introduces a stateless core for scalability, long-running asynchronous tasks, and OAuth/OIDC security integrations.