xAI Grok 1.5T enters reinforcement learning
Elon Musk has confirmed that xAI's 1.5T parameter Grok model is currently undergoing reinforcement learning (RL). This indicates that the base training phase for the large language model is finished, and the development team has transitioned to the final post-training stage to refine safety, alignment, and task performance before a public release.
xAI is moving at breakneck speed to train and align extremely large-scale models, but reinforcement learning on a 1.5T model is a massive computational hurdle that will test the limits of their GPU clusters.
- –**Compute Intensity:** Conducting RL on a 1.5-trillion parameter model requires an immense amount of high-bandwidth memory and computing power, meaning xAI is utilizing their massive infrastructure to its full capacity.
- –**Release Timeline:** Transitioning to RL suggests that the base model is fully cooked, pointing to a potential release within the next few months if safety and alignment tuning goes smoothly.
- –**Competitive Landscape:** A 1.5T parameter model would put Grok in direct competition with frontier models from OpenAI and Anthropic in terms of raw capacity and reasoning capabilities.
DISCOVERED
1h ago
2026-06-07
PUBLISHED
1h ago
2026-06-07
RELEVANCE
AUTHOR
mark_k