Momentum Gradient Descent
Complete Solution with Step-by-Step Calculations
1. Problem Statement
📋 Minimize the Function
f(x) = (x - 4)²
Gradient:
f'(x) = 2(x - 4)
Goal: Use Momentum Gradient Descent to find the minimum at x* = 4
2. Hyperparameters Configuration
Learning Rate
α = 0.2
Controls step size in descent direction
Momentum Factor
β = 0.9
Accumulates velocity (high = aggressive)
Initial Position
x₀ = 10
Starting point (far from minimum)
Initial Velocity
v₀ = 0
No initial momentum
3. Update Rules
🔄 Momentum Gradient Descent Equations
Velocity Update
v(t) = β·v(t-1) - α·f'(x(t))
Momentum term (βv) + Gradient term (-αf')
Position Update
x(t+1) = x(t) + v(t)
Move in direction of accumulated velocity
4. Step-by-Step Iterations
➊ Iteration 1: First Step
Gradient Calculation
g₁ = f'(x₀) = 2(10 - 4) = 12
Velocity Update
v₁ = 0.9(0) - 0.2(12) = -2.4
✓ Velocity initialized (gradient-driven step)
Position Update
x₁ = 10 + (-2.4) = 7.6
➋ Iteration 2: Acceleration Phase
Gradient Calculation
g₂ = 2(7.6 - 4) = 7.2
Velocity Update
v₂ = 0.9(-2.4) - 0.2(7.2) = -2.16 - 1.44 = -3.6
✓ Momentum accumulates! v₂ > v₁ (more negative)
Position Update
x₂ = 7.6 + (-3.6) = 4.0
✓✓ OPTIMUM REACHED (x* = 4)
➌ Iteration 3: Overshooting
Gradient Calculation
g₃ = 2(4 - 4) = 0
⚠️ Gradient is zero at optimum, but momentum continues!
Velocity Update
v₃ = 0.9(-3.6) - 0.2(0) = -3.24
Position Update
x₃ = 4.0 - 3.24 = 0.76
→ Overshoots past optimum due to momentum!
➍ Iteration 4: Correction Phase
Gradient Calculation
g₄ = 2(0.76 - 4) = -6.48
Gradient now points back toward x* = 4
Velocity Update
v₄ = 0.9(-3.24) - 0.2(-6.48) = -2.916 + 1.296 = -1.62
✓ Velocity decreases (correction begins)
Position Update
x₄ = 0.76 - 1.62 = -0.86
5. Summary Table - All Iterations
6. Key Observations & Insights
✅ Success & Convergence
- Iteration 2: Algorithm reaches exact optimum x* = 4 with f(x*) = 0
- Momentum effect: Velocity accumulates, enabling faster descent than standard GD
- Convergence speed: Only 2 iterations to reach minimum (very efficient!)