Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models

3 hours ago

Richard Bellman's 1952 work on dynamic programming laid the foundation for optimal control and reinforcement learning.
Bellman extended dynamic programming to continuous-time systems, linking it to the 19th-century Hamilton-Jacobi equation.
The Hamilton-Jacobi-Bellman (HJB) equation is key for continuous-time control, derived from dynamic programming principles.
Continuous-time reinforcement learning uses the HJB equation for policy iteration and Q-learning methods.
Stochastic LQR and Merton portfolio problems serve as benchmarks with closed-form solutions for validating algorithms.
Diffusion models can be interpreted as stochastic optimal control problems, where the optimal control is the score function.

Hasty Briefsbeta