Markov Decision Process (MDP)

A Markov Decision Process is a mathematical framework for sequential decision-making under uncertainty. An MDP consists of: (1) state space S; (2) action space A; (3) transition probabilities P(s'|s,a); (4) reward function R(s,a); (5) discount factor γ. The goal is finding an optimal policy π*: S → A maximizing expected cumulative reward. The Bellman optimality equation: V*(s) = max_a [R(s,a) + γ Σs' P(s'|s,a)V*(s')]. Solution methods: value iteration, policy iteration, linear programming, and reinforcement learning algorithms (Q-learning, SARSA, policy gradients). MDPs model: robotics control, inventory management, finance, and healthcare treatment decisions.

» OR glossary