BACK_TO_FEEDAICRIER_2
Qwen 1.5B Crawls on CPU
OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoINFRASTRUCTURE

Qwen 1.5B Crawls on CPU

A Reddit user says Qwen's 1.5B model only reaches 0.12 tokens/sec on a Redmi 12 Android phone and asks whether that speed is normal. The replies point to a CPU-bound setup and likely missing mobile GPU offload.

// ANALYSIS

Small models are still painfully slow when the runtime falls back to CPU on weak mobile hardware.

  • On a Redmi 12-class Snapdragon 4 Gen 1 phone, memory bandwidth and thermals can dominate long before parameter count does.
  • llama.cpp officially supports Adreno OpenCL, but Snapdragon Hexagon offload is still marked in progress, so Android acceleration is backend-dependent.
  • 0.12 tok/s is consistent with a CPU-only path, even for a 1.5B model, if the app is not actually offloading work.
  • For local mobile LLMs, backend support and quantization matter as much as model size.
// TAGS
qwenllama.cppllminferenceedge-aigpuopen-weightsself-hosted

DISCOVERED

17d ago

2026-03-25

PUBLISHED

18d ago

2026-03-25

RELEVANCE

7/ 10

AUTHOR

Ambitious-Cod6424