Cursor trains Composer 2.5 via codebase feature deletion
Cursor scaled training tasks 25x for its Composer 2.5 model using an automated code deletion pipeline. The process generates reinforcement learning puzzles by removing features from real codebases and requiring the model to rebuild them to pass existing tests.
Using automated codebase feature deletion and test-driven evaluation represents a highly scalable path for reinforcement learning in AI coding models. It provides realistic training tasks without the high cost of manual annotation.
- –Feature deletion puzzles ground model training in real production patterns rather than synthetic toy tasks.
- –Existing unit and integration test suites serve as cheap, deterministic reward functions.
- –Injected targeted textual feedback during trajectories helps solve credit assignment problems during long coding actions.
- –Scaling task diversity 25x directly improves the agent's generalization across multi-file edits and tool calls.
DISCOVERED
2h ago
2026-06-24
PUBLISHED
2h ago
2026-06-24
RELEVANCE
AUTHOR
tibor_tee