YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Qwen3.5-35B-A3B strains RTX 4090, RAM load expected

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Qwen3.5-35B-A3B strains RTX 4090, RAM load expected
OPEN LINK ↗
// 63d agoMODEL RELEASE

Qwen3.5-35B-A3B strains RTX 4090, RAM load expected

A LocalLLaMA user asks whether the memory footprint they see while running Qwen3.5-35B-A3B on an RTX 4090 is expected, and whether the model is also using system RAM. The post asks if that footprint is normal for a large Qwen checkpoint with a very large default context window.

// ANALYSIS

Some RAM use is likely normal here; the A3B suffix suggests a sparse setup, so active-path size is only part of the memory story. The official model card shows a 262,144-token default context and serving recipes that assume tensor parallel on 8 GPUs, which is a strong signal that single-card runs are memory-constrained. If the backend is offloading weights or KV cache, host RAM use is expected rather than suspicious. For newcomers, the important knobs are quantization, context length, and backend choice, not just the GPU model. The post is a useful sanity check: big model on a 4090 usually means compromises, not a bug.

// TAGS
qwen3-5-35b-a3bllminferencegpuself-hostedopen-weightsmultimodal

DISCOVERED

63d ago

2026-03-25

PUBLISHED

63d ago

2026-03-25

RELEVANCE

8/ 10

AUTHOR

fernandollb