tinyforge teaches tiny models from failures

// 94d agoOPENSOURCE RELEASE

tinyforge teaches tiny models from failures

tinyforge is a new open-source project that uses test-driven repair training to help a sub-1B local model improve its coding performance on laptop hardware. In the posted experiments, a 0.8B model trained on just 13 self-generated repair pairs improved single-pass HumanEval results from 16/50 to 28/50 and got noticeably better at using failure feedback inside a search loop.

// ANALYSIS

The most interesting result here is not the raw benchmark bump but the claim that tiny models can learn how to use verifier feedback, not just memorize answers. If that holds up, tinyforge points to a cheap local recipe for self-improving systems in any domain with automatic checks.

–The project combines evolutionary search, exact test-failure feedback, and LoRA fine-tuning instead of relying on a larger teacher model
–The repo argues the biggest gains come from the repair loop itself, with the trained adapter becoming a better “repair partner” when shown what failed
–Resource requirements are unusually approachable for this kind of work: Apple Silicon, 6GB-13GB peak memory, and a few minutes of training
–The code is MIT-licensed and already packaged as a runnable CLI, which makes it more than a one-off Reddit experiment
–The limitation is explicit: this does not magically turn a 0.8B model into GPT-4 class performance; the gains are scoped and system-dependent

// TAGS

tinyforgellmfine-tuningai-codingtestingopen-source

DISCOVERED

94d ago

2026-03-10

PUBLISHED

94d ago

2026-03-10

RELEVANCE

8/ 10

AUTHOR

QuantumSeeds

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL35m ago

Step 3.7 Flash launches on DeepInfra

DeepInfra has launched serverless API access for Step 3.7 Flash, a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model developed by StepFun. The model is specifically optimized for complex agentic workloads and features a 256K context window with selectable reasoning effort levels.

NEWS1h ago

Anthropic allegedly edits Mythos 5, Fable 5 system card

A user on X noticed discrepancies between the current system card for Anthropic's Mythos 5 and Fable 5 on their CDN and the version saved on launch day. Both versions display the date "June 9th" on the front page, leading to speculation that Anthropic silently edited the document without issuing an update or version bump.

VIDEO1h ago

User showcases Claude Fable 5 native PDF generation

A recent viral post on X by @agentnative_ demonstrates the remarkable ability of Anthropic's Claude Fable 5 model to create PDFs natively. The tweet highlights the practical document generation skills of the newly released "Mythos-class" AI model, drawing attention to its utility in advanced knowledge work and agentic workflows.