DanceOPD unifies T2I, image editing
ByteDance Seed researchers have introduced DanceOPD, an on-policy generative field distillation framework that unifies text-to-image synthesis, local editing, and global editing within a single flow-matching model. The approach resolves common training conflicts between diverse generation tasks by routing training samples through specialist teacher velocity fields.
DanceOPD offers a clever solution to the capability conflict in image generation, proving that a single model can handle T2I, local editing, and global editing without degrading overall quality. By shifting query execution to the student's own rollout states, it establishes a more stable and high-performing pipeline for multi-capability distillation.
- –**On-Policy Routing:** Routes each training sample to a specific capability field and queries a low-noise student-induced state, avoiding the trajectory mismatch issues of off-policy methods.
- –**Improved Composition:** Achieves an 8.1% gain in joint text-to-image/editing tasks and a 16.1% boost in combined local/global editing compared to standard on-policy baselines.
- –**CFG Absorption:** The velocity MSE objective seamlessly absorbs operator-defined fields like Classifier-Free Guidance (CFG), eliminating the need for separate, complex CFG post-processing.
- –**Unified Architectures:** Reduces the operational overhead of serving multiple expert models (T2I, local editors, global editors) by consolidating their capabilities into a single flow-matching student.
DISCOVERED
1h ago
2026-06-26
PUBLISHED
2h ago
2026-06-26
RELEVANCE
AUTHOR
_akhaliq