OPEN_SOURCE ↗
HN · HACKER_NEWS// 12d agoTUTORIAL
HJB Tutorial Bridges RL, Diffusion Models
Daniel Lopez Montero's post explains why the Hamilton-Jacobi-Bellman equation is Bellman's continuous-time optimal-control equation, then walks through policy iteration, model-free continuous-time Q-learning, and two benchmark problems: stochastic LQR and the Merton portfolio. It closes by showing how reverse-time diffusion sampling can be reframed as a control problem with the score function acting as the optimal drift correction.
// ANALYSIS
This is the rare theory-heavy AI tutorial that earns its length: it gives one clean control-theoretic frame for continuous-time RL and diffusion models, which makes both topics feel like different views of the same math.
- –The LQR and Merton examples are the right validation cases because they have closed-form optima and let the neural policy-iteration setup prove itself.
- –The diffusion section is the most interesting part: reverse-time sampling becomes a finite-horizon control problem, and the score function emerges as the optimal drift correction.
- –The post assumes comfort with SDEs, PDEs, and convex duality, so it is more of an advanced bridge piece than a beginner-friendly walkthrough.
- –HN traction suggests there is still a hungry audience for rigorous AI math when it pays off with a unifying story.
// TAGS
continuous-rlresearchagent
DISCOVERED
12d ago
2026-03-30
PUBLISHED
13d ago
2026-03-30
RELEVANCE
7/ 10
AUTHOR
sebzuddas