OPEN_SOURCE ↗
REDDIT · REDDIT// 23d agoBENCHMARK RESULT
Unsloth Qwen3.5-35B IQ4_XS tops 100 t/s
A Reddit user says Unsloth’s Qwen3.5-35B-A3B-UD-IQ4_XS now runs cleanly in the latest Ooba build, hitting around 100 tokens/sec on a 3090 with a huge context window. Their 3D Snake demo suggests the bigger win is not flash, but a local model that can stay on task and actually finish a bounded coding job.
// ANALYSIS
This is the kind of local-model result that matters: not leaderboard bragging rights, but a fast enough, obedient enough model that feels usable in real workflows.
- –Roughly 100 t/s on a 3090 changes the experience from “offline batch run” to “interactive assistant,” which is a big deal for coding and agent loops
- –The key signal is persistence under iteration: the model reportedly fixed mistakes and delivered a working Three.js demo after other models kept breaking the app
- –That makes it a strong candidate for agentic tooling like Cline, where multi-step follow-through matters more than one-shot cleverness
- –Unsloth’s own Qwen3.5 docs position the family around 256K context, so this quant sits in a sweet spot of speed, memory efficiency, and long-context practicality
- –The caveat is scope: this is a strong anecdote, not a broad eval suite, so repo-scale refactors and tool-use reliability still need wider testing
// TAGS
llminferencegpubenchmarkself-hostedunslothqwen3.5-35b-a3b
DISCOVERED
23d ago
2026-03-20
PUBLISHED
23d ago
2026-03-20
RELEVANCE
9/ 10
AUTHOR
EuphoricPenguin22