REDDIT · REDDIT// 3h agoBENCHMARK RESULT

Mistral Medium 3.5 benchmarked on 3x3090

Reddit user shares real-world local inference speed for Mistral Medium 3.5 Q3_K_M on three RTX 3090s, with examples across Python, SVG, and HTML generation. It’s a practical datapoint for anyone judging whether a 128B open-weight model is usable on high-end consumer hardware.

// ANALYSIS

This is less a launch story than a deployment reality check: Mistral says Medium 3.5 can self-host on as few as four GPUs, and this post shows what that looks like in the wild.

–Q3 quantization is doing the heavy lifting here; without it, a 128B model would be much harder to fit into 72GB VRAM.
–The different prompt types matter: code, SVG, and HTML stress the model and runtime differently, so the screenshots are more useful than a single synthetic tokens/sec number.
–For local AI builders, this is the core tradeoff: you get an open-weight frontier-ish model, but you still pay in GPU count, power, and tuning effort.
–If the quality holds up at this speed, Medium 3.5 becomes interesting for self-hosted coding agents, not just chat demos.

// TAGS

mistral-medium-3-5llmopen-weightsquantizationinferencegpuself-hostedbenchmark

DISCOVERED

3h ago

2026-05-04

PUBLISHED

6h ago

2026-05-04

RELEVANCE

8/ 10

AUTHOR

jacek2023