RTX 5090 dominates local AI benchmarks

// 45d agoBENCHMARK RESULT

RTX 5090 dominates local AI benchmarks

New benchmarks for the sparse MoE model Qwen3.6-35B-A3B reveal that the NVIDIA RTX 5090 achieves a record-breaking 220+ tokens per second using llama.cpp. While NVIDIA's GDDR7 bandwidth provides a massive leap in raw generation speed, the Mac M5 Max remains the "context king" for developers needing massive 128GB unified memory pools for repository-level reasoning.

// ANALYSIS

The RTX 5090’s GDDR7 bandwidth finally makes sparse MoE models feel like local "instant" intelligence, but Apple’s memory architecture still wins on utility for deep codebase reasoning.

–The 5090 delivers a ~30% generation speed increase over the 4090, peaking at 240 t/s during long-context generation.
–Qwen3.6-35B-A3B activates only 3B parameters per token, allowing the aging RTX 3090 to still deliver a respectable 140 t/s.
–Mac M5 Max is restricted by memory bandwidth for raw speed (~95 t/s) but can natively host 1M token context windows that would require 4+ RTX 3090s to fit in VRAM.
–These results suggest that for developer agentic workflows, the 5090 is the new gold standard for latency, while high-RAM Macs remain the standard for large-scale repo analysis.

// TAGS

llmgpubenchmarkopen-weightsqwen-3-6rtx-5090mac-m5-maxllama-cpp

DISCOVERED

45d ago

2026-04-20

PUBLISHED

45d ago

2026-04-19

RELEVANCE

9/ 10

AUTHOR

chain-77

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

Cloudflare AI Gateway receives a major dashboard and visual redesign to improve developer experience.

Cloudflare has released a significant design and dashboard refresh for its AI Gateway product to streamline developer workflows. The update relocates AI features to a dedicated top-level section in the dashboard sidebar, simplifies the onboarding process for new gateway configurations, and consolidates fragmented code snippets into a unified view customizable by provider, SDK, and API type. Additionally, the release introduces more precise cost analytics charts for small monetary values, updates the performance of the dynamic route builder, and enhances keyboard navigation accessibility.

SECURITY2h ago

Miasma campaign targets npm ecosystem, compromising AI packages

The Miasma supply chain campaign, which previously compromised 32 Red Hat packages, is now targeting the npm ecosystem in a new wave of attacks. This campaign specifically targets high-traffic AI packages, including vapi-ai/server-sdk with 71,000 weekly downloads and ai-sdk-ollama with 31,000 weekly downloads.

UPDATE2h ago

Lightpanda CLI adds networkalmostidle support

Lightpanda, a lightweight headless browser built in Zig for AI agents and web automation, has updated its CLI to support `networkalmostidle` as a `--wait-until` condition. This integration allows automated tasks to proceed as soon as network activity subsides, ensuring that pages are effectively loaded without waiting for non-essential network connections to close, resulting in faster and more reliable agent interactions.