OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoINFRASTRUCTURE
Qwen 1.5B Crawls on CPU
A Reddit user says Qwen's 1.5B model only reaches 0.12 tokens/sec on a Redmi 12 Android phone and asks whether that speed is normal. The replies point to a CPU-bound setup and likely missing mobile GPU offload.
// ANALYSIS
Small models are still painfully slow when the runtime falls back to CPU on weak mobile hardware.
- –On a Redmi 12-class Snapdragon 4 Gen 1 phone, memory bandwidth and thermals can dominate long before parameter count does.
- –llama.cpp officially supports Adreno OpenCL, but Snapdragon Hexagon offload is still marked in progress, so Android acceleration is backend-dependent.
- –0.12 tok/s is consistent with a CPU-only path, even for a 1.5B model, if the app is not actually offloading work.
- –For local mobile LLMs, backend support and quantization matter as much as model size.
// TAGS
qwenllama.cppllminferenceedge-aigpuopen-weightsself-hosted
DISCOVERED
17d ago
2026-03-25
PUBLISHED
18d ago
2026-03-25
RELEVANCE
7/ 10
AUTHOR
Ambitious-Cod6424