Qwen-Scope maps Qwen 3.5 hidden features
Qwen Team released Qwen-Scope, a public SAE suite for Qwen 3.5 models spanning 2B through 35B MoE, plus a Hugging Face Space for feature exploration and steering. It exposes residual-stream features across layers, turning model internals into something researchers can inspect, localize, and intervene on.
This is a serious interpretability release, not just another model dump. The big story is that Qwen is making feature-level control and debugging feel practical for a broad model family, which moves SAEs from niche research into usable tooling.
- –Coverage across multiple Qwen 3.5 sizes makes this more useful than a one-off demo on a single checkpoint.
- –Residual-stream, all-layer coverage matters because it lets you trace behaviors like language switching, refusals, and style drift to specific learned features.
- –Steering and ablation are the obvious headline use cases, but the more durable value is debugging and dataset auditing for fine-tunes.
- –It is also plainly dual-use: the same machinery that helps explain behavior can be used to suppress safety-related features or push the model toward unwanted behaviors.
- –Compared with prompt-only control, feature editing is much more surgical, which is why interpretability folks will care and policy folks will be uneasy.
DISCOVERED
45d ago
2026-04-30
PUBLISHED
45d ago
2026-04-30
RELEVANCE
AUTHOR
MadPelmewka