
whichllm ranks best local LLMs for hardware
A CLI tool that auto-detects system hardware to rank and launch the best-performing local LLMs from HuggingFace. It optimizes model selection by matching specific quantizations to VRAM while factoring in real-world speed and quality benchmarks.
whichllm solves the "black box" problem of local LLM performance by providing empirical rankings based on a user's specific GPU and RAM.
- –Hardware-specific scoring accounts for VRAM overhead, memory bandwidth, and model architecture for accurate performance predictions.
- –The "plan" command offers a reverse-lookup for hardware buyers, identifying the components needed to run specific models like Llama 3 or Qwen.
- –Integrated execution via isolated uv environments makes it a one-command alternative to complex setups for rapid model exploration.
- –v0.5.2 improves Apple Silicon estimation and multimodal model scoring, ensuring unified memory and vision-capable LLMs are correctly ranked.
- –Live data fetching from HuggingFace ensures rankings track the latest releases and benchmark shifts in real-time.
DISCOVERED
1h ago
2026-05-15
PUBLISHED
4h ago
2026-05-15
RELEVANCE
AUTHOR
andyyyy64