YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

5070 Ti, RX 9070 build hits 100 tps

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

5070 Ti, RX 9070 build hits 100 tps
OPEN LINK ↗
// 45d agoINFRASTRUCTURE

5070 Ti, RX 9070 build hits 100 tps

A dual-GPU setup combining an NVIDIA RTX 5070 Ti and AMD RX 9070 achieves over 100 tokens per second on Qwen 3.6 35B. By leveraging the llama.cpp Vulkan backend, the "Frankenstein" build effectively pools 32GB of mismatched VRAM for high-speed local inference.

// ANALYSIS

Cross-vendor VRAM pooling via Vulkan is now a viable, high-performance alternative to the "CUDA tax" for local LLM inference. The RX 9070’s 256-bit bus and 645 GB/s bandwidth offer superior performance-per-dollar compared to a second NVIDIA card, while Vulkan's maturation in llama.cpp allows seamless memory pooling across NVIDIA and AMD hardware without significant overhead. Strategic task splitting and the use of models like Qwen 3.6 35B represent a new sweet spot for high-speed 32GB setups.

// TAGS
llama-cppgpullmvulkanqwenopen-sourceedge-ai

DISCOVERED

45d ago

2026-04-18

PUBLISHED

45d ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

DavidBolkonsky