Qwen3.5-9B hits 4.6 t/s on Jetson Orin Nano Super

// 93d agoBENCHMARK RESULT

Qwen3.5-9B hits 4.6 t/s on Jetson Orin Nano Super

Community benchmarks for Qwen 3.5-9b (Q3_K_M) on the NVIDIA Jetson Orin Nano Super show a 4.6 tokens/s throughput using llama.cpp. The performance results highlight the memory bandwidth bottlenecks of entry-level edge hardware when running 9B parameter models without specialized inference stacks.

// ANALYSIS

While 4.6 t/s is a usable baseline for local LLM tasks, the hardware is capable of significantly higher throughput with deeper optimization.

–Memory bandwidth (102 GB/s) is the primary ceiling, mathematically limiting a 9B model pass to a theoretical max of ~20 t/s
–Standard llama.cpp builds often underutilize Jetson's Tensor Cores; switching to MLC LLM or TensorRT-LLM could potentially double or triple these speeds
–8GB of unified memory is tight for 9B models, requiring aggressive quantization (Q3/Q4) to leave sufficient room for the KV cache
–Enabling "Super Mode" via nvpmodel -m 0 and locking clocks is mandatory for consistent performance at this scale
–Developers seeking high-velocity inference should target the 2B variant, which can hit 15-20 t/s on the same hardware

// TAGS

qwen3-5-9bllmedge-aibenchmarkgpuinferenceopen-weights

DISCOVERED

93d ago

2026-03-08

PUBLISHED

96d ago

2026-03-06

RELEVANCE

8/ 10

AUTHOR

Otherwise-Sir7359

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL23m ago

Claude Fable 5 prompts wild user creations

Just sixteen hours after the release of Anthropic's Claude Fable 5, developers have built impressive projects showcasing the model's coding and 3D spatial capabilities. These creations range from browser-based 3D CAD editors to HTML-based Minecraft clones and physical solar system simulators.

NEWS38m ago

Claude Fable 5 tops 5.5 in data analysis

In a recent post on X, user Theo expressed intense enthusiasm about the data analysis capabilities of an AI model called Fable. By stating it is "WAY better than 5.5," the user implies a significant generational leap in performance over what is likely a major foundational model, suggesting Fable is exceptionally well-suited for complex data tasks.

MODEL1h ago

Claude Fable 5 launch sparks massive developer backlash

Anthropic's Claude Fable 5 launch faces severe developer backlash over aggressive safety restrictions, high pricing, and a forced 30-day data retention policy. The model silently routes chemistry, biology, and cybersecurity requests to the older Opus 4.8 model, frustrating users with opaque downgrades and anti-distillation blocks.