BACK_TO_FEEDAICRIER_2
5070 Ti, RX 9070 build hits 100 tps
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoINFRASTRUCTURE

5070 Ti, RX 9070 build hits 100 tps

A dual-GPU setup combining an NVIDIA RTX 5070 Ti and AMD RX 9070 achieves over 100 tokens per second on Qwen 3.6 35B. By leveraging the llama.cpp Vulkan backend, the "Frankenstein" build effectively pools 32GB of mismatched VRAM for high-speed local inference.

// ANALYSIS

Cross-vendor VRAM pooling via Vulkan is now a viable, high-performance alternative to the "CUDA tax" for local LLM inference. The RX 9070’s 256-bit bus and 645 GB/s bandwidth offer superior performance-per-dollar compared to a second NVIDIA card, while Vulkan's maturation in llama.cpp allows seamless memory pooling across NVIDIA and AMD hardware without significant overhead. Strategic task splitting and the use of models like Qwen 3.6 35B represent a new sweet spot for high-speed 32GB setups.

// TAGS
llama-cppgpullmvulkanqwenopen-sourceedge-ai

DISCOVERED

4h ago

2026-04-18

PUBLISHED

6h ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

DavidBolkonsky