Memla CLI Claims 9B Beats 32B Raw
Memla is a CLI for local Ollama coding models that wraps smaller models in a bounded constraint-repair and backtest loop instead of prompting them raw. The public repo says its current proof packet shows `qwen3.5:9b + Memla` beating raw `qwen2.5:32b` on an OAuth patch execution benchmark, with a 0.67 apply and 0.67 semantic success result versus 0.00 for the raw 32B run. The claim is explicitly scoped to verifier-backed code execution tasks, not general model superiority.
This is a strong reminder that runtime design can matter as much as model size when the task is narrow and testable.
- –The interesting part is not the model, but the scaffolding: Memla adds planning, repair, and verification around local Ollama models.
- –The repo frames the claim carefully as bounded execution performance, which is more credible than a blanket “9B beats 32B” headline.
- –The benchmark result is still self-reported and narrow, so it reads as an engineering proof point rather than a general scientific conclusion.
- –If the loop is robust, this could be useful for local-first dev workflows where users care about passing tests more than fluent chat.
DISCOVERED
54d ago
2026-04-04
PUBLISHED
54d ago
2026-04-04
RELEVANCE
AUTHOR
Willing-Opening4540