RTX PRO 4500 Blackwell runs Qwen3.6-27B
One user reports a Qwen3.6-27B UD-Q5_K_XL setup running cleanly on an RTX PRO 4500 Blackwell through llama.cpp, with 131k context, full GPU offload, and about 35.8 tokens/sec generation. It looks like a solid local coding rig, but not a reason to assume a much bigger model will automatically feel smarter.
Good local inference box, not a magic intelligence upgrade. On 32GB of VRAM, the win is fit and responsiveness: Qwen3.6-27B is in the sweet spot where a serious coding model is usable without giving up too much speed.
- –The RTX PRO 4500 Blackwell is a 32GB card, so extra system RAM does not change the actual model-fit ceiling.
- –About 35.8 tok/s is strong enough for interactive coding, refactors, and Roo-style agent loops, especially with flash-attn and full GPU offload.
- –Going larger usually means slower tokens, tighter context budgets, or heavier quantization; that can feel worse than a well-tuned 27B.
- –If the goal is "smarter," better gains usually come from a stronger checkpoint, repo-aware retrieval, and tighter prompting than from blindly scaling parameters.
- –For UE5 work, this setup is best at file edits, engine scripting, code review, and local/private context rather than frontier-level reasoning.
DISCOVERED
1h ago
2026-05-09
PUBLISHED
3h ago
2026-05-09
RELEVANCE
AUTHOR
Merstin
