REDDIT · REDDIT// 5h agoBENCHMARK RESULT

CMP 100-210 powers Medina 14B inference

A May 5 benchmark paper runs Medina 14B across a CMP 100-210, RTX 3090, and RTX 3060, then makes the CMP card the dedicated inference GPU. It does not match the 3090 on raw throughput, but it does keep LoRA swaps fast and fits the model at a 36K context window with usable VRAM headroom.

// ANALYSIS

The title oversells parity with the 3090, but the underlying result is still useful: this is a cheap, awkward mining card that becomes surprisingly practical when you care more about VRAM and adapter switching than peak tok/s.

–The CMP 100-210 posts 47.3 tok/s versus 78.1 tok/s on the RTX 3090, so it is not a throughput equal.
–LoRA hot-swap latency stays in the 13-18 ms range across all tested GPUs, which makes the workflow feel CPU/I/O-bound more than GPU-bound.
–The card’s 16 GB VRAM is enough for a 36K context setup, but the 3090 still buys real safety margin and faster inference.
–Multi-GPU layer splitting underperforms the solo 3090 because PCIe synchronization overhead eats the benefit.
–For local LLM builders on a budget, the CMP 100-210 is interesting as a dedicated inference node, not as a universal 3090 replacement.

// TAGS

llmbenchmarkgpuinferencequantizationmedina-14bnvidia-cmp-100-210

DISCOVERED

5h ago

2026-05-05

PUBLISHED

7h ago

2026-05-05

RELEVANCE

7/ 10

AUTHOR

desexmachina