SmallHarness launches version 1.0 with model routing, followed by a hotfix introducing active effort routing to control reasoning depth for local and cloud LLMs.

// 47d agoOPENSOURCE RELEASE

SmallHarness launches version 1.0 with model routing, followed by a hotfix introducing active effort routing to control reasoning depth for local and cloud LLMs.

SmallHarness, a terminal-based developer tool for running agentic LLMs on local hardware and cloud APIs, has officially released version 1.0.0, followed by a quick 1.0.1 update introducing active effort routing. The tool supports multiple backends—including Ollama, LM Studio, MLX, llama.cpp, and OpenRouter—and features safe execution of filesystem/shell commands via interactive approval gates and diff previews. The new active effort routing allows the system to analyze task complexity and automatically set reasoning effort levels (from minimal to max) sent to providers like OpenRouter and OpenAI, offering a highly customizable framework that optimizes performance, cost, and latency.

// ANALYSIS

While mainstream agent frameworks grow increasingly complex and bloated, SmallHarness demonstrates that a lightweight, Rust-powered TUI can provide a faster, safer, and more transparent environment for local and cloud LLMs.

* Active Effort Routing: Automatically scales reasoning effort (from minimal to max) based on task complexity, helping developers manage API costs and local resources.

* Flexible Backend Support: Out-of-the-box routing for Ollama, LM Studio, MLX, llama.cpp, and OpenRouter facilitates easy transitions between offline and online models.

* Secure Agentic Tools: Approval gates and diff previews for filesystem and terminal commands protect the user from destructive operations.

* Fault-Tolerant Parsing: An inline JSON detector extracts tool calls reliably even when smaller local LLMs struggle with formatting constraints.

// TAGS

smallharnessai-agentsrustlocal-firstopen-sourceopenrouter

DISCOVERED

47d ago

2026-06-15

PUBLISHED

47d ago

2026-06-15

RELEVANCE

7/ 10

AUTHOR

morganlinton

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL40m ago

DeepSeek-V4-Flash-High excels at low-cost frontend coding

AI researcher Elvis Saravia (@omarsar0) highlighted the impressive front-end development capabilities of DeepSeek-V4-Flash-High during recent testing. He noted that the model's output quality was high enough to prompt a double-check of which model was actively being used, praising its performance-to-price ratio.

TUTORIAL1h ago

DAIR.AI offers harness engineering, evals training

DAIR.AI emphasizes harness engineering and model evaluations as essential skills for building production-grade AI applications. The platform is releasing educational resources and courses focused on evaluation harnesses and systematic testing.

TUTORIAL1h ago

Dual Blackwell GPUs run 167 GB DeepSeek-V4 FP8

A developer shared a deployment recipe for running the official FP8 version of DeepSeek-V4-Flash-0731 alongside DSpark speculative decoding on a dual NVIDIA RTX PRO 6000 Blackwell (SM120) GPU rig. Requiring approximately 167 GB of VRAM, the model fits cleanly across the system's combined 192 GB VRAM capacity (2× 96 GB) without offloading or truncation.