simple_dlm makes diffusion LMs approachable
simple_dlm is a tiny open-source diffusion language model implementation trained on Karpathy's Tiny Shakespeare dataset, with a 7.5M-parameter character model and a 66-token vocabulary. The repo is more learning artifact than production model, but it gives developers a compact path into masked/discrete diffusion for text.
The real value here is demystification: diffusion language models still sound exotic, and a small repo that runs on an M2 Air can make the mechanics feel inspectable.
- –Implements a hand-built diffusion language model rather than wrapping a large framework
- –Uses a tiny character-level setup, which keeps tokenizer, masking, training, and sampling concepts visible
- –Fits the current wave of interest around non-autoregressive and masked diffusion text generation
- –Output quality is intentionally rough, but the project works as a practical learning scaffold
DISCOVERED
45d ago
2026-04-21
PUBLISHED
45d ago
2026-04-21
RELEVANCE
AUTHOR
Encrux615