Momentum Gradient Descent

Complete Solution with Step-by-Step Calculations

1. Problem Statement

📋 Minimize the Function

f(x) = (x - 4)²

Gradient:

f'(x) = 2(x - 4)

Goal: Use Momentum Gradient Descent to find the minimum at x* = 4

2. Hyperparameters Configuration

Learning Rate

α = 0.2

Controls step size in descent direction

Momentum Factor

β = 0.9

Accumulates velocity (high = aggressive)

Initial Position

x₀ = 10

Starting point (far from minimum)

Initial Velocity

v₀ = 0

No initial momentum

3. Update Rules

🔄 Momentum Gradient Descent Equations

Velocity Update

v(t) = β·v(t-1) - α·f'(x(t))

Momentum term (βv) + Gradient term (-αf')

Position Update

x(t+1) = x(t) + v(t)

Move in direction of accumulated velocity

4. Step-by-Step Iterations

➊ Iteration 1: First Step

Gradient Calculation

g₁ = f'(x₀) = 2(10 - 4) = 12

Velocity Update

v₁ = 0.9(0) - 0.2(12) = -2.4

✓ Velocity initialized (gradient-driven step)

Position Update

x₁ = 10 + (-2.4) = 7.6

➋ Iteration 2: Acceleration Phase

Gradient Calculation

g₂ = 2(7.6 - 4) = 7.2

Velocity Update

v₂ = 0.9(-2.4) - 0.2(7.2) = -2.16 - 1.44 = -3.6

✓ Momentum accumulates! v₂ > v₁ (more negative)

Position Update

x₂ = 7.6 + (-3.6) = 4.0

✓✓ OPTIMUM REACHED (x* = 4)

➌ Iteration 3: Overshooting

Gradient Calculation

g₃ = 2(4 - 4) = 0

⚠️ Gradient is zero at optimum, but momentum continues!

Velocity Update

v₃ = 0.9(-3.6) - 0.2(0) = -3.24

Position Update

x₃ = 4.0 - 3.24 = 0.76

→ Overshoots past optimum due to momentum!

➍ Iteration 4: Correction Phase

Gradient Calculation

g₄ = 2(0.76 - 4) = -6.48

Gradient now points back toward x* = 4

Velocity Update

v₄ = 0.9(-3.24) - 0.2(-6.48) = -2.916 + 1.296 = -1.62

✓ Velocity decreases (correction begins)

Position Update

x₄ = 0.76 - 1.62 = -0.86

5. Summary Table - All Iterations

Iteration x(t) Gradient g(t) Velocity v(t) f(x)
0 10.00 12.00 0.00 36.00
1 7.60 7.20 -2.40 12.96
2 4.00 0.00 -3.60 0.00 ✓
3 0.76 -6.48 -3.24 10.50
4 -0.86 -1.62 23.59

6. Key Observations & Insights

✅ Success & Convergence

  • Iteration 2: Algorithm reaches exact optimum x* = 4 with f(x*) = 0
  • Momentum effect: Velocity accumulates, enabling faster descent than standard GD
  • Convergence speed: Only 2 iterations to reach minimum (very efficient!)
آخر تعديل: الاثنين، 8 ديسمبر 2025، 3:04 AM