DeepSeek's 8B reasoning distill hits reality check
A LocalLLaMA user running DeepSeek-R1-0528-Qwen3-8B in LM Studio on an M4 MacBook reports that the model spent more than a minute thinking, then produced pages of unusable output on a simple CSV-conversion task. That complaint cuts against DeepSeek's own benchmark-heavy positioning for the model and lines up with broader community criticism that the 8B distill can look strong on evals yet feel erratic in real local workflows.
This is the classic local-LLM trap: benchmark-chasing small reasoning models often impress on leaderboards, then burn tokens and collapse on boring real work.
- –DeepSeek's model card pitches the 8B distill as SOTA among open 8B reasoning models, but the Reddit post highlights the gap between eval wins and everyday utility
- –Hugging Face discussion threads around the same model include similar complaints about overlong reasoning, poor instruction following, and unreliable basic-task behavior
- –The model appears highly setup-sensitive: DeepSeek recommends a specific system prompt and temperature 0.6, while some users report better results in Ollama than in other local runtimes
- –For AI developers, the lesson is practical: small reasoning distills are not automatically the best default local assistants, especially for structured extraction jobs where stable instruction following matters more than chain-of-thought bravado
DISCOVERED
81d ago
2026-03-08
PUBLISHED
81d ago
2026-03-08
RELEVANCE
AUTHOR
EconomicsHelpful4593