YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

De Freitas proposes causal interactive training

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

De Freitas proposes causal interactive training
OPEN LINK ↗
// 1h agoRESEARCH PAPER

De Freitas proposes causal interactive training

Microsoft AI VP Nando de Freitas proposes a unified training framework for AI agents based on continual, causal interaction streams rather than multi-stage fine-tuning pipelines. By treating world-written tokens as evidence and self-written tokens as interventions, the method achieves competitive reasoning performance with a simpler, single-stream objective.

// ANALYSIS

The proposed framework challenges the complex, multi-stage training recipes of modern LLMs in favor of a single, theoretically grounded interaction stream.

  • Multi-stage pipelines (SFT, RLHF, GRPO) are criticized as a research local minimum that lacks clean mathematical semantics for interaction histories.
  • By distinguishing between evidence (world-written tokens) and interventions (agent-written tokens), the model simplifies training using a loss mask.
  • A STEM reasoning experiment shows the causal agent matches the performance of complex reinforcement learning methods like GRPO.
  • The approach draws on universal artificial intelligence as imitation, shifting agent goals from reward maximization to action prediction.
// TAGS
llmtrainingreasoningagentcontinual-interactive-causal-agentsnando-de-freitas

DISCOVERED

1h ago

2026-06-25

PUBLISHED

17d ago

2026-06-07

RELEVANCE

8/ 10

AUTHOR

NandoDF