
CUA-Gym automates the generation of verifiable reinforcement learning training environments and datasets for computer-use agents to bypass the training data bottleneck.
CUA-Gym is an open-source pipeline and dataset developed by XLang AI to scale Reinforcement Learning with Verifiable Rewards (RLVR) for Computer-Use Agents (CUAs). It addresses the training data bottleneck by co-generating task instructions, environment states, and Python reward functions using a multi-agent Generator-Discriminator-Orchestrator architecture. The framework is paired with CUA-Gym-Hub, which hosts 98 high-fidelity, self-contained mock web applications designed with state injection and session isolation. This setup allows parallel RL training sessions on shared backends without resource interference, producing over 32,000 verified RLVR training tuples that significantly improve CUA performance on benchmarks like OSWorld and WebArena.
Scaling reinforcement learning for digital agents has been severely bottlenecked by state management and lack of programmatic rewards; CUA-Gym bypasses this by utilizing high-fidelity mock environments with state injection and session isolation.
* Session isolation and state injection resolve the biggest engineering blocker for CUA reinforcement learning, allowing parallel agents to train on shared local environments without database corruption.
* The multi-agent Generator-Discriminator-Orchestrator pipeline automates the generation of verifiable instruction-state-reward triplets, bypassing expensive human-in-the-loop task creation.
* The reliance on a suite of mock applications provides a highly efficient sandbox, but transitioning these agents to arbitrary real-world web apps still presents a generalization gap that mock training alone cannot fully solve.
DISCOVERED
2h ago
2026-06-13
PUBLISHED
2h ago
2026-06-13
RELEVANCE
AUTHOR
AlphaSignalAI