BACK_TO_FEEDAICRIER_2
HJB Tutorial Bridges RL, Diffusion Models
OPEN_SOURCE ↗
HN · HACKER_NEWS// 12d agoTUTORIAL

HJB Tutorial Bridges RL, Diffusion Models

Daniel Lopez Montero's post explains why the Hamilton-Jacobi-Bellman equation is Bellman's continuous-time optimal-control equation, then walks through policy iteration, model-free continuous-time Q-learning, and two benchmark problems: stochastic LQR and the Merton portfolio. It closes by showing how reverse-time diffusion sampling can be reframed as a control problem with the score function acting as the optimal drift correction.

// ANALYSIS

This is the rare theory-heavy AI tutorial that earns its length: it gives one clean control-theoretic frame for continuous-time RL and diffusion models, which makes both topics feel like different views of the same math.

  • The LQR and Merton examples are the right validation cases because they have closed-form optima and let the neural policy-iteration setup prove itself.
  • The diffusion section is the most interesting part: reverse-time sampling becomes a finite-horizon control problem, and the score function emerges as the optimal drift correction.
  • The post assumes comfort with SDEs, PDEs, and convex duality, so it is more of an advanced bridge piece than a beginner-friendly walkthrough.
  • HN traction suggests there is still a hungry audience for rigorous AI math when it pays off with a unifying story.
// TAGS
continuous-rlresearchagent

DISCOVERED

12d ago

2026-03-30

PUBLISHED

13d ago

2026-03-30

RELEVANCE

7/ 10

AUTHOR

sebzuddas