BACK_TO_FEEDAICRIER_2
Qwen3.5-24B REAP squeezes agentic coding into 16GB
OPEN_SOURCE ↗
REDDIT · REDDIT// 38d agoPRODUCT LAUNCH

Qwen3.5-24B REAP squeezes agentic coding into 16GB

A LocalLLaMA contributor released a 32% expert-pruned GGUF variant of Qwen3.5-35B-A3B aimed at coding and agentic workflows on lower-VRAM hardware. The release includes quantized checkpoints, pruning/quantization scripts, and a reproducible Modal pipeline.

// ANALYSIS

This is a practical community optimization drop, not a new base model, but it materially lowers the barrier to running strong MoE coding models locally.

  • The model trims experts from 256 to 175 while keeping ~3B active parameters per token, targeting better memory efficiency.
  • The recommended IQ4_K_S GGUF is positioned for 16GB-class GPUs, which is the core value proposition here.
  • The author shares full replication assets (REAP fork + Modal scripts), making this useful for other quantizers and pruning experiments.
  • Calibration limits (1024 context and memory pressure during profiling) suggest further quality/perf gains are still possible.
// TAGS
qwen3.5local-llmggufmoequantizationagentic-coding

DISCOVERED

38d ago

2026-03-05

PUBLISHED

38d ago

2026-03-04

RELEVANCE

8/ 10

AUTHOR

tubuntu2