REDDIT · REDDIT// 3h agoBENCHMARK RESULT

GMKtec EVO-X2 runs Llama 70B at 5 t/s

User testing of the GMKtec EVO-X2 confirms that AMD's Strix Halo architecture handles 70B models locally at usable speeds. The 256GB/s unified memory bandwidth enables roughly 5.25 tokens per second for Llama-3.3-70B, though performance drops as context grows toward 16k tokens.

// ANALYSIS

The GMKtec EVO-X2 marks a shift in local AI hardware, proving that high-bandwidth unified memory can finally replace expensive dual-GPU setups for 70B-class models.

–5.25 t/s is a 3-4x performance leap over previous generation iGPUs (RDNA 3) which typically hit ~1.5 t/s on 70B models
–128GB of LPDDR5X-8000 is the critical enabler, allowing 70B models to reside entirely in VRAM with room for large KV caches
–Performance degradation at high context (dropping to 2.5 t/s at 16k) reveals the compute-to-bandwidth bottleneck inherent in current integrated solutions
–Linux with ROCm 6.4.4 remains the optimal software stack for maximizing Strix Halo's gfx1151 architecture
–The necessity of GRUB tweaks (ttm.pages_limit) highlights that "AI PCs" still require significant manual tuning for enthusiast-level local inference

// TAGS

llmgpuinferenceself-hostedllama-3-3gmktec-evo-x2-ai-mini-pc

DISCOVERED

3h ago

2026-04-28

PUBLISHED

5h ago

2026-04-28

RELEVANCE

8/ 10

AUTHOR

Non-Technical