Dev pools Blackwell, Ada GPUs for local LLMs

// 90d agoINFRASTRUCTURE

Dev pools Blackwell, Ada GPUs for local LLMs

A developer upgrading to an RTX Pro 4000 Blackwell explores whether to keep an older RTX 2000 Ada to pool 40GB of total VRAM for running Qwen MoE models via llama.cpp. The query highlights the growing trend of leveraging mismatched enterprise GPUs to maximize local inference capacity.

// ANALYSIS

Mixing GPU architectures and VRAM sizes is the secret weapon of the local LLM community, turning disparate hardware into viable, high-capacity inference servers.

–llama.cpp natively supports sequential layer splitting across mismatched GPUs, making a combined 24GB and 16GB setup highly effective for fitting larger open-weight models into memory.
–Standard PCIe slots provide sufficient bandwidth for token generation, though the initial prompt processing (prefill) phase might see slight bottlenecks compared to NVLink.
–By designating the newer Blackwell card as the primary GPU for KV caching and sampling, developers can maximize generation speed while still fully utilizing the Ada card's VRAM.

// TAGS

llama-cppgpuinferencellmopen-weights

DISCOVERED

90d ago

2026-04-18

PUBLISHED

90d ago

2026-04-18

RELEVANCE

7/ 10

AUTHOR

bromatofiel

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL12m ago

OpenRouter adds nine new AI models

Unified API provider OpenRouter has added nine major new AI models to its platform, highlighted by Moonshot AI's Kimi K3, Meta AI's Muse Spark 1.1, and Thinking Machines Lab's Inkling. The additions provide developers with immediate API access to these frontier systems for tasks ranging from long-horizon coding and tool use to multimodal reasoning.

UPDATE54m ago

Tesana automates character weapon rigging

Tesana AI has rolled out an engine upgrade that automates character weapon and item attachments, bypassing the tedious manual rigging process. By automatically handling grip points and alignment, the engine allows developers to speed up asset importing and focus on core game design.

BENCHMARK1h ago

GLM-5.2 matches closed models on cyber tasks

The UK AI Security Institute (AISI) has released evaluation results from testing leading open-weight AI models against closed frontier systems on practical cyber work, such as vulnerability research, reverse engineering, exploitation, and multi-step network attacks. The benchmark results indicate that the performance gap between open-weight and closed-weight models is shrinking rapidly, with Z.ai's open-weight GLM-5.2 matching the cyber capabilities of closed frontier models released just four to seven months prior.