Diffusion video reproducibility drifts across GPUs
Even with identical weights, prompt, sampler, and starting noise, diffusion video outputs are not guaranteed to match across GPU architectures. The likely result is usually the same broad scene with some perceptual drift in fine details, but long denoising chains can make those differences more visible.
The short version: fixed latent plus deterministic sampling is necessary, but not sufficient for cross-GPU reproducibility. Tiny floating-point and kernel-order differences can accumulate across many steps, so “same output” is a stronger claim than most stacks can support.
- –The biggest risk is not the seed; it is backend variance from attention kernels, matmul precision, and reduction order
- –On a stable stack, architecture differences usually show up first in textures, faces, edges, and other high-frequency details
- –Video diffusion is more sensitive than still-image generation because errors compound over more frames and more denoising steps
- –If you need reproducibility, lock the entire software stack and validate perceptual similarity, not bitwise equality
- –The practical question is whether drift stays within “same idea, slightly different render” or crosses into “different clip”; that depends on how numerically brittle the model is
DISCOVERED
2h ago
2026-05-07
PUBLISHED
5h ago
2026-05-07
RELEVANCE
AUTHOR
hellosandrik