Mahoraga says Qwen3 4B tops cloud agents

// 90d agoBENCHMARK RESULT

Mahoraga says Qwen3 4B tops cloud agents

Mahoraga is an open-source LLM orchestrator that routes tasks between local and cloud agents with a LinUCB contextual bandit. Its benchmark claims Qwen3 4B is the best code and refactor model in the stack, beating the cloud agents on quality while running locally on a 16GB MacBook Pro.

// ANALYSIS

The interesting part is not just that a local model wins on one benchmark slice, but that Mahoraga turns routing itself into a learnable system instead of a hand-tuned ruleset. That makes this more compelling as infrastructure than as a one-off model leaderboard post.

–The strongest claim is narrow but useful: Qwen3 4B looks best for code/refactor, while other agents still win on research or planning buckets.
–The LinUCB setup is the real product story: it learns per-bucket routing over time, which is exactly the kind of adaptation static “best model” rules miss.
–The benchmark design is practical: no LLM-as-judge, no paid eval loop, and a local hardware target that matches the “I have a laptop, not a datacenter” constraint.
–The weak spot is the scorer itself: security came out flat across agents, so the system still needs better task-specific evaluation before anyone should trust it broadly.
–This will matter most to developers who care about cost control, offline workflows, or shuttling tasks between cheap local models and premium cloud models automatically.

// TAGS

mahoragallmai-codingagentbenchmarkopen-sourcecliautomation

DISCOVERED

90d ago

2026-04-28

PUBLISHED

90d ago

2026-04-27

RELEVANCE

9/ 10

AUTHOR

Own-Professional3092

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

TUTORIAL39m ago

Seedance 2.0 upscaling workflow cuts 4K generation costs

A creator shares a cost-saving workflow for producing 4K AI videos using Seedance 2.0 without paying for the platform's expensive native 4K credit tier. By outputting lower-resolution clips and applying separate video upscaling tools, creators can consume up to 4–5x fewer generation credits while achieving virtually identical 4K results for free or a minimal cost.

INFRA41m ago

Apache Cassandra delivers linear scale and availability

Apache Cassandra is an open-source distributed NoSQL database system designed to store and manage massive amounts of data across multiple nodes without a single point of failure. Featuring a masterless architecture, linear scalability, and configurable consistency controls, Cassandra provides high-throughput write/read operations and continuous availability for mission-critical enterprise applications.

OPEN SOURCE41m ago

AG Kit brings modular agent workflows to IDEs

AG Kit is an open-source modular toolkit designed to enhance AI-assisted development by introducing specialized subagents, domain skills, and interactive workflows to IDEs like Google Antigravity, Cursor, and Windsurf. By structuring workspace context inside an `.agents/` directory, it enables developers to orchestrate multi-agent task execution, code reviews, and system design through familiar slash-command interfaces.