GLM-5.2 adopts critic-based PPO training

// 45d agoMODEL RELEASE

GLM-5.2 adopts critic-based PPO training

Zhipu AI's release of the open-weights model GLM-5.2 marks a design pivot back to critic-based PPO training rather than group-wise variance reduction like GRPO. This critic-based setup computes token-level advantages for individual rollouts, enabling better support for compaction in long-horizon agentic tasks and 1M-token context windows.

// ANALYSIS

While the industry recently rushed toward group-based reinforcement learning (like GRPO) to save on critic-model compute overhead, Zhipu's pivot back to critic-based PPO shows that group-wise variance reduction fails to scale for ultra-long contexts and complex, multi-turn reasoning.

–The Critic's Return: Using a dedicated critic allows the model to compute token-level advantages, which is crucial when tracing complex, long-horizon decision trees.
–Compaction Support: Group-wise methods struggle with comparing relative outputs when traces are compacted; the critic-based setup natively supports compacted sub-traces.
–Agentic Specialization: This structural change positions GLM-5.2 as a highly capable model for project-scale engineering and multi-step agent actions.

// TAGS

glm-5.2zhipureinforcement-learningppogrpoopen-weightsagentlong-contextllm

DISCOVERED

45d ago

2026-06-17

PUBLISHED

45d ago

2026-06-17

RELEVANCE

8/ 10

AUTHOR

jeremyphoward

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE12m ago

Imagine enables custom voice creation, native 1080p video

Imagine has introduced new capabilities allowing users to create AI characters with their own custom voices and render videos in native 1080p resolution. This update combines audio synthesis and high-definition visual generation to provide creators with greater control over character identity and video quality.

UPDATE25m ago

ADE adds scheduled wake-ups, live tracking

ADE has introduced native support for scheduled jobs and wake-ups across all supported AI providers, enabling developers to run automated background workflows with models like Codex similar to Claude Code. Additionally, ADE updated its sidebar interface to provide live status tracking for active chats, displaying whether agents are currently waiting, working, or planning.

UPDATE30m ago

OpenAI ships ChatGPT Chrome extension, voice, API cuts

OpenAI has launched major updates to ChatGPT, including an official Chrome extension with multi-tab context handling and skill recording for web automation. The release also expands desktop voice mode control and reduces API pricing to lower operational costs for developers.