OPEN_SOURCE ↗
REDDIT · REDDIT// 38d agoPRODUCT LAUNCH
Qwen3.5-24B REAP squeezes agentic coding into 16GB
A LocalLLaMA contributor released a 32% expert-pruned GGUF variant of Qwen3.5-35B-A3B aimed at coding and agentic workflows on lower-VRAM hardware. The release includes quantized checkpoints, pruning/quantization scripts, and a reproducible Modal pipeline.
// ANALYSIS
This is a practical community optimization drop, not a new base model, but it materially lowers the barrier to running strong MoE coding models locally.
- –The model trims experts from 256 to 175 while keeping ~3B active parameters per token, targeting better memory efficiency.
- –The recommended IQ4_K_S GGUF is positioned for 16GB-class GPUs, which is the core value proposition here.
- –The author shares full replication assets (REAP fork + Modal scripts), making this useful for other quantizers and pruning experiments.
- –Calibration limits (1024 context and memory pressure during profiling) suggest further quality/perf gains are still possible.
// TAGS
qwen3.5local-llmggufmoequantizationagentic-coding
DISCOVERED
38d ago
2026-03-05
PUBLISHED
38d ago
2026-03-04
RELEVANCE
8/ 10
AUTHOR
tubuntu2