OPEN_SOURCE ↗
REDDIT · REDDIT// 2d agoBENCHMARK RESULT
llama.cpp Q8 mmproj matches FP16
A Reddit tester compared Q8 and FP16 multimodal projectors across small vision models in llama.cpp and found mostly identical results. The main exception was Qwen3.5 4B, where FP16 sometimes looked noisier or less grounded than Q8 in edge cases.
// ANALYSIS
Anecdotal, but directionally useful: for local multimodal inference, `mmproj` precision may matter far less than the conventional FP16 default suggests.
- –Across most models, Q8 and FP16 changed phrasing and confidence more than actual image understanding
- –Qwen3.5 0.8B seemed to gain a bit from FP16, which may be more about tiny text-model instability than vision precision
- –Qwen3.5 4B was the surprise: FP16 sometimes overfocused on irrelevant detail, while Q8 picked up the obvious object
- –The post’s setup is CPU-only, temp 0, and self-described as informal, so this is not a benchmark verdict
- –Still, it points to a practical default for local runs: Q8 mmproj may be enough unless you have a specific reason to keep FP16
// TAGS
llama-cppmultimodalinferencebenchmarkopen-source
DISCOVERED
2d ago
2026-04-09
PUBLISHED
3d ago
2026-04-09
RELEVANCE
8/ 10
AUTHOR
WhoRoger