De Freitas proposes causal interactive training

// 1h agoRESEARCH PAPER

De Freitas proposes causal interactive training

Microsoft AI VP Nando de Freitas proposes a unified training framework for AI agents based on continual, causal interaction streams rather than multi-stage fine-tuning pipelines. By treating world-written tokens as evidence and self-written tokens as interventions, the method achieves competitive reasoning performance with a simpler, single-stream objective.

// ANALYSIS

The proposed framework challenges the complex, multi-stage training recipes of modern LLMs in favor of a single, theoretically grounded interaction stream.

–Multi-stage pipelines (SFT, RLHF, GRPO) are criticized as a research local minimum that lacks clean mathematical semantics for interaction histories.
–By distinguishing between evidence (world-written tokens) and interventions (agent-written tokens), the model simplifies training using a loss mask.
–A STEM reasoning experiment shows the causal agent matches the performance of complex reinforcement learning methods like GRPO.
–The approach draws on universal artificial intelligence as imitation, shifting agent goals from reward maximization to action prediction.

// TAGS

llmtrainingreasoningagentcontinual-interactive-causal-agentsnando-de-freitas

DISCOVERED

1h ago

2026-06-25

PUBLISHED

17d ago

2026-06-07

RELEVANCE

8/ 10

AUTHOR

NandoDF

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS4h ago

LuaJIT 3.0 proposes modern syntax extensions

Mike Pall has proposed a set of modern syntax extensions for LuaJIT 3.0, introducing features like nil-coalescing, optional chaining, and compound assignment. These features aim to improve developer quality-of-life and will be backported to LuaJIT 2.1 to ease compiler bootstrapping.

NEWS5h ago

GLM-5.2 rivals Claude Opus 4.8

A coding comparison by developer Hassan (@nutlope) shows Z.ai's open-weights model GLM-5.2 matches Claude Opus 4.8 on frontend web tasks. While GLM-5.2 is more verbose, it achieves comparable design quality at a fraction of the cost.

RESEARCH5h ago

OpenAI details RL alignment generalization

OpenAI's latest alignment research demonstrates that training AI models on beneficial traits in a single domain, like healthcare, generalizes to completely unrelated tasks. This reinforcement learning approach improves performance on 80% of out-of-distribution safety benchmarks and increases resistance to adversarial jailbreaking.