Qwen3.5-35B-A3B strains RTX 4090, RAM load expected

// 108d agoMODEL RELEASE

Qwen3.5-35B-A3B strains RTX 4090, RAM load expected

A LocalLLaMA user asks whether the memory footprint they see while running Qwen3.5-35B-A3B on an RTX 4090 is expected, and whether the model is also using system RAM. The post asks if that footprint is normal for a large Qwen checkpoint with a very large default context window.

// ANALYSIS

Some RAM use is likely normal here; the A3B suffix suggests a sparse setup, so active-path size is only part of the memory story. The official model card shows a 262,144-token default context and serving recipes that assume tensor parallel on 8 GPUs, which is a strong signal that single-card runs are memory-constrained. If the backend is offloading weights or KV cache, host RAM use is expected rather than suspicious. For newcomers, the important knobs are quantization, context length, and backend choice, not just the GPU model. The post is a useful sanity check: big model on a 4090 usually means compromises, not a bug.

// TAGS

qwen3-5-35b-a3bllminferencegpuself-hostedopen-weightsmultimodal

DISCOVERED

108d ago

2026-03-25

PUBLISHED

108d ago

2026-03-25

RELEVANCE

8/ 10

AUTHOR

fernandollb

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE1h ago

C# PS5 emulator SharpEmu boots 2D games

SharpEmu is an experimental, open-source PlayStation 5 emulator written in C# that targets Windows, Linux, and macOS. In its early development stages, the project has successfully booted simple 2D games like Dreaming Sarah and shown initial progress loading complex titles such as Demon's Souls Remake.

OPEN SOURCE1h ago

background-agents launches multi-repo coding agents

background-agents is an open-source platform for running autonomous coding agents asynchronously in cloud sandboxes. Built on Cloudflare, Modal, and Daytona, the system enables agents to perform long-running tasks like security audits and migrations across multiple repositories.

OPEN SOURCE1h ago

FlClash is a multi-platform proxy client based on ClashMeta, offering a simple, open-source, and ad-free interface.

FlClash is an open-source, multi-platform GUI proxy client built on ClashMeta. Developed using Dart and Flutter, it offers a unified, ad-free interface for managing network proxy settings across Android, iOS, Windows, macOS, and Linux. The application aims to provide a user-friendly way to configure and run ClashMeta-based rule routing.