5070 Ti, RX 9070 build hits 100 tps

// 45d agoINFRASTRUCTURE

5070 Ti, RX 9070 build hits 100 tps

A dual-GPU setup combining an NVIDIA RTX 5070 Ti and AMD RX 9070 achieves over 100 tokens per second on Qwen 3.6 35B. By leveraging the llama.cpp Vulkan backend, the "Frankenstein" build effectively pools 32GB of mismatched VRAM for high-speed local inference.

// ANALYSIS

Cross-vendor VRAM pooling via Vulkan is now a viable, high-performance alternative to the "CUDA tax" for local LLM inference. The RX 9070’s 256-bit bus and 645 GB/s bandwidth offer superior performance-per-dollar compared to a second NVIDIA card, while Vulkan's maturation in llama.cpp allows seamless memory pooling across NVIDIA and AMD hardware without significant overhead. Strategic task splitting and the use of models like Qwen 3.6 35B represent a new sweet spot for high-speed 32GB setups.

// TAGS

llama-cppgpullmvulkanqwenopen-sourceedge-ai

DISCOVERED

45d ago

2026-04-18

PUBLISHED

45d ago

2026-04-17

RELEVANCE

8/ 10

AUTHOR

DavidBolkonsky

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

LAUNCH19m ago

Clicky is an open-source, native macOS AI assistant that lives next to your cursor, using real-time screen awareness and voice control to guide workflows and execute tasks hands-free.

Clicky (also known as HeyClicky) is an open-source, macOS-native AI assistant designed to eliminate context-switching by living directly next to the user's cursor. Using a hotkey for screen-capture awareness and voice interaction, it guides users through complex desktop applications like Figma, DaVinci Resolve, and code editors. With voice-activated agents, users can also execute background tasks such as researching information or managing local workflows hands-free, creating a seamless, cursor-layer desktop interaction experience.

UPDATE1h ago

Hermes Agent brings token streaming to Telegram

Nous Research announced that its open-source, self-hosted AI agent framework, Hermes Agent, now supports real-time token streaming when integrated with Telegram. By enabling streaming over the Telegram bot API, the conversational experience becomes significantly more fluid, responsive, and engaging.

NEWS2h ago

ElevenLabs opens voice-powered storefront during NYTechWeek

ElevenLabs has launched an interactive voice-powered storefront as part of NYTechWeek. This physical activation stands out from typical tech week events like panel discussions, allowing attendees to experience their voice AI technology in a real-world setting.

5070 Ti, RX 9070 build hits 100 tps