Qwen 3.5 27B hits 2,000 TPS

// 88d agoBENCHMARK RESULT

Qwen 3.5 27B hits 2,000 TPS

A LocalLLaMA user reports roughly 2,000 tokens/sec prefill throughput for markdown-document classification using an Unsloth Q5_K_XL GGUF build of Qwen 3.5 27B on an RTX 5090 with llama.cpp CUDA 13. The setup is tuned for long inputs, minimal outputs, and batch parallelism, making it strong for high-volume classification but highly workload-specific.

// ANALYSIS

This is a strong real-world throughput datapoint for local inference, but it should be read as a specialized benchmark rather than a general performance baseline.

–The reported speed is dominated by input-heavy prefill, not long-form generation throughput.
–Disabling vision/mmproj and using “no thinking” removed extra compute paths for this text-only task.
–Reducing context to 128k and matching parallelism to batch size (8) helped keep VRAM pressure controlled.
–The author notes evals are still partial, so accuracy and quality tradeoffs need fuller validation.

// TAGS

qwen3-5-27bllminferencegpubenchmarkllama-cpp

DISCOVERED

88d ago

2026-03-14

PUBLISHED

88d ago

2026-03-13

RELEVANCE

8/ 10

AUTHOR

awitod

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL14m ago

Designers praise Claude Fable 5 landing pages

Educator and designer Meng To highlighted Claude Fable 5's capability for creating landing pages on X, calling the model "a monster" for the task. Released in June 2026, Claude Fable 5 is Anthropic's latest Mythos-class AI model, featuring a 1-million-token context window, a 128,000-token output capacity, and advanced reasoning for long-horizon agentic workflows, making it highly effective for complex design and front-end code generation tasks.

MODEL1h ago

Claude Fable 5 hits Google Cloud

Anthropic's new Mythos-class frontier AI model, Claude Fable 5, is now generally available on Google Cloud's Agent Platform (Vertex AI). Designed for complex, long-horizon reasoning and autonomous workflows, Fable 5 is built for tasks such as software engineering, deep research, and multi-day agentic execution, featuring built-in safety guardrails that automatically redirect sensitive queries to Claude Opus 4.8.

UPDATE1h ago

B.AI integrates Claude Fable 5 into developer API

Developer platform B.AI has integrated Anthropic's Claude Fable 5 model into its API ecosystem. Developers can now utilize Claude Fable 5's advanced reasoning and code generation capabilities within B.AI's unified, OpenAI-compatible API framework, which simplifies model access, agent identity management, and transaction payments.