BACK_TO_FEEDAICRIER_2
Qwen3.5-9B hits 4.6 t/s on Jetson Orin Nano Super
OPEN_SOURCE ↗
REDDIT · REDDIT// 34d agoBENCHMARK RESULT

Qwen3.5-9B hits 4.6 t/s on Jetson Orin Nano Super

Community benchmarks for Qwen 3.5-9b (Q3_K_M) on the NVIDIA Jetson Orin Nano Super show a 4.6 tokens/s throughput using llama.cpp. The performance results highlight the memory bandwidth bottlenecks of entry-level edge hardware when running 9B parameter models without specialized inference stacks.

// ANALYSIS

While 4.6 t/s is a usable baseline for local LLM tasks, the hardware is capable of significantly higher throughput with deeper optimization.

  • Memory bandwidth (102 GB/s) is the primary ceiling, mathematically limiting a 9B model pass to a theoretical max of ~20 t/s
  • Standard llama.cpp builds often underutilize Jetson's Tensor Cores; switching to MLC LLM or TensorRT-LLM could potentially double or triple these speeds
  • 8GB of unified memory is tight for 9B models, requiring aggressive quantization (Q3/Q4) to leave sufficient room for the KV cache
  • Enabling "Super Mode" via nvpmodel -m 0 and locking clocks is mandatory for consistent performance at this scale
  • Developers seeking high-velocity inference should target the 2B variant, which can hit 15-20 t/s on the same hardware
// TAGS
qwen3-5-9bllmedge-aibenchmarkgpuinferenceopen-weights

DISCOVERED

34d ago

2026-03-08

PUBLISHED

37d ago

2026-03-06

RELEVANCE

8/ 10

AUTHOR

Otherwise-Sir7359