YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.5-9B hits 4.6 t/s on Jetson Orin Nano Super

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.5-9B hits 4.6 t/s on Jetson Orin Nano Super
OPEN LINK ↗
// 80d agoBENCHMARK RESULT

Qwen3.5-9B hits 4.6 t/s on Jetson Orin Nano Super

Community benchmarks for Qwen 3.5-9b (Q3_K_M) on the NVIDIA Jetson Orin Nano Super show a 4.6 tokens/s throughput using llama.cpp. The performance results highlight the memory bandwidth bottlenecks of entry-level edge hardware when running 9B parameter models without specialized inference stacks.

// ANALYSIS

While 4.6 t/s is a usable baseline for local LLM tasks, the hardware is capable of significantly higher throughput with deeper optimization.

  • Memory bandwidth (102 GB/s) is the primary ceiling, mathematically limiting a 9B model pass to a theoretical max of ~20 t/s
  • Standard llama.cpp builds often underutilize Jetson's Tensor Cores; switching to MLC LLM or TensorRT-LLM could potentially double or triple these speeds
  • 8GB of unified memory is tight for 9B models, requiring aggressive quantization (Q3/Q4) to leave sufficient room for the KV cache
  • Enabling "Super Mode" via nvpmodel -m 0 and locking clocks is mandatory for consistent performance at this scale
  • Developers seeking high-velocity inference should target the 2B variant, which can hit 15-20 t/s on the same hardware
// TAGS
qwen3-5-9bllmedge-aibenchmarkgpuinferenceopen-weights

DISCOVERED

80d ago

2026-03-08

PUBLISHED

83d ago

2026-03-06

RELEVANCE

8/ 10

AUTHOR

Otherwise-Sir7359