YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

RTX 5090 dominates local AI benchmarks

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

RTX 5090 dominates local AI benchmarks
OPEN LINK ↗
// 45d agoBENCHMARK RESULT

RTX 5090 dominates local AI benchmarks

New benchmarks for the sparse MoE model Qwen3.6-35B-A3B reveal that the NVIDIA RTX 5090 achieves a record-breaking 220+ tokens per second using llama.cpp. While NVIDIA's GDDR7 bandwidth provides a massive leap in raw generation speed, the Mac M5 Max remains the "context king" for developers needing massive 128GB unified memory pools for repository-level reasoning.

// ANALYSIS

The RTX 5090’s GDDR7 bandwidth finally makes sparse MoE models feel like local "instant" intelligence, but Apple’s memory architecture still wins on utility for deep codebase reasoning.

  • The 5090 delivers a ~30% generation speed increase over the 4090, peaking at 240 t/s during long-context generation.
  • Qwen3.6-35B-A3B activates only 3B parameters per token, allowing the aging RTX 3090 to still deliver a respectable 140 t/s.
  • Mac M5 Max is restricted by memory bandwidth for raw speed (~95 t/s) but can natively host 1M token context windows that would require 4+ RTX 3090s to fit in VRAM.
  • These results suggest that for developer agentic workflows, the 5090 is the new gold standard for latency, while high-RAM Macs remain the standard for large-scale repo analysis.
// TAGS
llmgpubenchmarkopen-weightsqwen-3-6rtx-5090mac-m5-maxllama-cpp

DISCOVERED

45d ago

2026-04-20

PUBLISHED

45d ago

2026-04-19

RELEVANCE

9/ 10

AUTHOR

chain-77