Dynabatch boosts MT throughput with dynamic batching
Dynabatch is a PyTorch sampler that increases batch sizes for shorter examples after length sorting, using a learned GPU-memory model to stay under a safe baseline. It targets encoder-decoder workloads, and the reported throughput gains are benchmark-specific.
Hot take: this is a practical niche tool for MT and other encoder-decoder workloads, not a universal batching strategy.
- –Best fit is variable-length seq2seq training where source length correlates with target length and padding waste is the main bottleneck.
- –The approach is empirical, so it can work well on one model/tokenizer/hardware stack and drift on another.
- –The fallback-on-OOM design is sensible, because the regressor can still overpredict memory headroom.
- –The headline throughput win is credible as a local benchmark, but it should not be read as a generalizable benchmark claim.
- –For decoder-only workloads, packing is still likely the cleaner first choice.
DISCOVERED
45d ago
2026-04-28
PUBLISHED
45d ago
2026-04-28
RELEVANCE
AUTHOR
Leather_Loan5314