CogArch trains LLMs via competitive self-play

// 90d agoOPENSOURCE RELEASE

CogArch trains LLMs via competitive self-play

CogArch is an open-source self-improvement framework where two LLMs compete to solve coding problems, using unit test execution to generate DPO training pairs for verifiable alignment without human labels.

// ANALYSIS

CogArch demonstrates that verifiable rewards (code execution) can successfully drive model improvement without human-in-the-loop, mirroring the techniques used by top-tier reasoning models like o1 and DeepSeek-R1.

–Replacing the standard "judge model" with raw execution results eliminates model bias and ensures a ground-truth reward signal.
–The use of DPO instead of PPO or GRPO makes the training loop stable and computationally accessible for developers with local hardware.
–A sophisticated memory system allows agents to retrieve and learn from past errors, such as off-by-one errors, before their first attempt at a new problem.
–Multi-specialist agents with varying temperatures ensure high diversity in generated solutions, which is critical for creating high-quality preference pairs.
–Early results showing a +1.2pp gain on HumanEval from just 39 training pairs highlight the high sample efficiency of this competitive approach.

// TAGS

cogarchai-codingllmfine-tuningagentopen-sourcereasoning

DISCOVERED

90d ago

2026-04-16

PUBLISHED

90d ago

2026-04-16

RELEVANCE

8/ 10

AUTHOR

Outrageous_Mark9761

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE45m ago

Lightpanda agent REPL renders styled terminal markdown

Lightpanda has introduced a markdown-to-ANSI terminal renderer for its interactive agent REPL, styling headings, lists, inline formatting, and OSC 8 clickable links. The rendering is gated exclusively to interactive TTY sessions to avoid breaking machine-readable piped workflows.

VIDEO52m ago

Kimi K3 Teaser Hints at Hybrid Recurrent-Attention

Moonshot AI has released a teaser video for Kimi K3, prompting analysis of its architectural concepts. Visual metaphors in the video hint at a shift from Kimi K2's transformer backbone to a memory-efficient, recurrent hybrid architecture.

OPEN SOURCE1h ago

NextChat unifies Claude, DeepSeek, GPT-4, and Gemini Pro

NextChat (formerly ChatGPT-Next-Web) is a highly versatile, open-source AI client that provides a fast and unified interface for accessing top-tier LLMs like Claude, GPT-4, DeepSeek, and Gemini Pro. It is available across web, desktop, and iOS, features Model Context Protocol (MCP) support, and provides an enterprise edition with extensive brand customization options.