Stability AI releases Stable-Layers, a reinforcement learning framework that trains image layer decomposition models without paired supervision data using Flow-GRPO and VLM feedback.
Stability AI has introduced Stable-Layers, a reinforcement learning framework designed to train image layer decomposition models without requiring paired training datasets. Traditionally, splitting a flat image into editable, multi-layer components required intensive human annotation. Stable-Layers bypasses this by adapting Group Relative Policy Optimization (GRPO) for flow-matching models (Flow-GRPO) to optimize image decomposition. The training is guided by a Vision-Language Model (VLM) serving as a judge, using a structured scoring and grid-based calibration pipeline to provide high-quality reward signals. This approach significantly reduces color bleed and blank layer artifacts, producing cleaner semantic separation.
Automated layer decomposition is a massive win for graphic design workflows, and Stable-Layers demonstrates that RL-based self-improvement using VLMs can effectively eliminate the need for costly paired datasets.
- –**Data Bottleneck Solution:** Training models to generate editable RGBA layers without paired ground-truth data shows that VLM-as-a-judge pipelines are viable for complex structural tasks.
- –**Flow-GRPO Integration:** Applying GRPO advantages to flow-matching models extends reinforcement learning techniques deeper into generative image pipelines.
- –**Clever Reward Design:** Using structured criteria combined with relative comparison grids overcomes the typical compression and bias issues of standalone VLM scoring.
DISCOVERED
1h ago
2026-06-07
PUBLISHED
1h ago
2026-06-07
RELEVANCE
AUTHOR
AI Search