Memla CLI Claims 9B Beats 32B Raw

// 100d agoBENCHMARK RESULT

Memla CLI Claims 9B Beats 32B Raw

Memla is a CLI for local Ollama coding models that wraps smaller models in a bounded constraint-repair and backtest loop instead of prompting them raw. The public repo says its current proof packet shows `qwen3.5:9b + Memla` beating raw `qwen2.5:32b` on an OAuth patch execution benchmark, with a 0.67 apply and 0.67 semantic success result versus 0.00 for the raw 32B run. The claim is explicitly scoped to verifier-backed code execution tasks, not general model superiority.

// ANALYSIS

This is a strong reminder that runtime design can matter as much as model size when the task is narrow and testable.

–The interesting part is not the model, but the scaffolding: Memla adds planning, repair, and verification around local Ollama models.
–The repo frames the claim carefully as bounded execution performance, which is more credible than a blanket “9B beats 32B” headline.
–The benchmark result is still self-reported and narrow, so it reads as an engineering proof point rather than a general scientific conclusion.
–If the loop is robust, this could be useful for local-first dev workflows where users care about passing tests more than fluent chat.

// TAGS

local-llmollamaclicoding-assistantbenchmarkcode-executionopen-source

DISCOVERED

100d ago

2026-04-04

PUBLISHED

100d ago

2026-04-04

RELEVANCE

8/ 10

AUTHOR

Willing-Opening4540

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

Native SDK v0.5 compiles TypeScript to native

Vercel Labs has released Native SDK v0.5, introducing TypeScript support to compile applications directly to native machine code without a JavaScript engine or garbage collector. Designed with AI agents in mind, the update features 83ns update dispatch latency, supports robust TypeScript features, and allows developers to eject to Zig at any point.

UPDATE1h ago

SST Console demos AI-built settings screen

SST co-founder Dax Raad demonstrated a new settings screen for the SST Console built entirely via an interactive, Slack-integrated AI coding agent. The development involved collaborative team prompting and iterative feedback loops with the agent, resulting in a functional interface and automated walkthrough video.

UPDATE2h ago

Perplexity Computer integrates Grok 4.5

Perplexity has integrated xAI's Grok 4.5 as the orchestrator for Perplexity Computer, achieving a top score of 0.328 on its internal WANDR benchmark. The integration is highly cost-effective, running at approximately half the cost of Anthropic's Claude Opus 4.8.