Part II: The Frank-Wolfe Algorithm

1. Objective and General Idea

🎯 The Objective

The Frank-Wolfe method (also known as the conditional gradient method) is used to solve optimization problems of the form:

$$ \min_{x \in D} f(x) $$

Where:

  • \(f(x)\) is a convex and differentiable function (its graph is "bowl-shaped").
  • \(D\) is a convex and compact constraint set (a region with no holes, like a polygon or a disk).

🧭 The General Idea: Moving Towards Vertices

Unlike standard gradient descent methods, which can "step outside" the allowed region and require a "projection" step to get back in, Frank-Wolfe is smarter. The algorithm is guaranteed to stay within the feasible set \(D\) at all times.

The Frank-Wolfe Trick

At each step, instead of blindly following the descent direction (the gradient), the algorithm asks a simple question:

"In the direction opposite to the gradient (the best local descent direction), what is the most extreme point in my entire allowed region \(D\)?"

The answer is almost always a vertex of the constraint polygon. The algorithm then takes a small step in the direction of that vertex. It transforms a difficult problem into a series of very simple linear problems.

2. The Steps of the Algorithm

Step 1: Choose a Feasible Starting Point

Choose a point \(x_0\) that respects the constraints (\(x_0 \in D\)). A vertex of the feasible set is often a good choice.

Step 2: Calculate the Gradient

Compute the gradient \(\nabla f(x_k)\) at the current point. This vector points in the direction where the function increases the fastest. To minimize, we will go in the opposite direction, \(-\nabla f(x_k)\).

Step 3: Solve the Auxiliary Linear Problem (LMO)

This is the key step, the Linear Minimization Oracle (LMO). We solve:

$$ s_k = \arg\min_{s \in D} \langle s, \nabla f(x_k) \rangle $$

This consists of finding the point \(s_k\) in the domain \(D\) that is "furthest" in the direction opposite to the gradient. This is an easy linear problem to solve: we just need to test the vertices of \(D\).

Step 4: Determine the Direction of Movement

The direction is simply the vector connecting the current point \(x_k\) to the target vertex \(s_k\).

$$ d_k = s_k - x_k $$

Step 5: Find the Optimal Step Size \(\gamma_k\)

We look for the best step size \(\gamma_k \in [0, 1]\) that minimizes \(f\) along the line segment between \(x_k\) and \(s_k\). We therefore solve a minimization problem in a single variable:

$$ \gamma_k = \arg\min_{\gamma \in [0, 1]} f(x_k + \gamma d_k) $$

(Several methods for choosing \(\gamma\) exist, see Section 3).

Step 6: Update the Point

We move by a step of size \(\gamma_k\) in the direction \(d_k\).

$$ x_{k+1} = x_k + \gamma_k d_k $$

Step 7: Check for Convergence

We stop if the point barely moves (\( \|x_{k+1} - x_k\| < \varepsilon \)) or if the function value stagnates. Otherwise, we return to Step 2.

3. Focus: How to Choose the Step Size \(\gamma\)?

The choice of step size \(\gamma\) is crucial for the algorithm's performance. Here are the most common methods:

1️⃣ Exact Line Search

This is the most precise method. We analytically solve the minimization problem for \(\gamma\) by differentiating the function \( \phi(\gamma) = f(x_k + \gamma d_k) \) and finding where the derivative is zero (\(\phi'(\gamma) = 0\)).

  • Advantage: Fastest convergence in terms of number of iterations.
  • Disadvantage: Can be computationally expensive if the derivative is complex.

2️⃣ Diminishing Step Size

We use a predefined sequence that guarantees convergence. The standard choice is:

$$ \gamma_k = \frac{2}{k+2} $$

The sequence is thus: \(1, 2/3, 1/2, 2/5, \dots\). The steps become smaller and smaller.

  • Advantage: Very simple to implement, guaranteed convergence.
  • Disadvantage: Can be slower in practice than exact line search.

3️⃣ Fixed Step Size

We choose a constant value, for example \(\gamma_k = 0.1\). This is the simplest method.

  • Advantage: Extremely simple, no extra computation.
  • Disadvantage: Risk of slow convergence or oscillations if the step size is poorly chosen.

4. Detailed Complete Example

Let's solve the following problem:

$$ \min f(x, y) = (x-1)^2 + (y-1.2)^2 $$

Subject to the constraints:

$$ x+y \le 3 $$ $$ x \ge 0, y \ge 0 $$

The vertices of the feasible region \(D\) are (0,0), (3,0), and (0,3).

Iteration 0 (k=0)

  1. Initial Point: We choose \( x_0 = (0, 0) \).
  2. Gradient: \( \nabla f(x,y) = [2(x-1), 2(y-1.2)] \).
    At \(x_0\), we have \( \nabla f(0,0) = [-2, -2.4] \).
  3. LMO: We seek \( s_0 = \arg\min_{s \in D} \langle s, [-2, -2.4] \rangle \).
    We need to minimize \(-2x - 2.4y\). We test the vertices:
    • (0,0): -2(0) - 2.4(0) = 0
    • (3,0): -2(3) - 2.4(0) = -6
    • (0,3): -2(0) - 2.4(3) = -7.2 (Minimum)
    Therefore, \( s_0 = (0, 3) \).
  4. Direction: \( d_0 = s_0 - x_0 = (0,3) - (0,0) = (0,3) \).
  5. Optimal Step (Exact Line Search): We want to minimize \( \phi(\gamma) = f(x_0 + \gamma d_0) = f(0, 3\gamma) \).
    \( \phi(\gamma) = (0-1)^2 + (3\gamma - 1.2)^2 = 1 + (3\gamma - 1.2)^2 \).
    We differentiate: \( \phi'(\gamma) = 2 \times 3 \times (3\gamma - 1.2) = 18\gamma - 7.2 \).
    We solve \( \phi'(\gamma) = 0 \Rightarrow 18\gamma = 7.2 \Rightarrow \gamma = 0.4 \).
    The optimal step is \( \gamma_0 = 0.4 \).
  6. Update: \( x_1 = x_0 + \gamma_0 d_0 = (0,0) + 0.4 \times (0,3) = (0, 1.2) \).

Iteration 1 (k=1)

  1. Current Point: \( x_1 = (0, 1.2) \).
  2. Gradient: \( \nabla f(0, 1.2) = [2(0-1), 2(1.2-1.2)] = [-2, 0] \).
  3. LMO: We seek \( s_1 = \arg\min_{s \in D} \langle s, [-2, 0] \rangle \).
    We need to minimize \(-2x\). We test the vertices:
    • (0,0): -2(0) = 0
    • (3,0): -2(3) = -6 (Minimum)
    • (0,3): -2(0) = 0
    Therefore, \( s_1 = (3, 0) \).
  4. Direction: \( d_1 = s_1 - x_1 = (3,0) - (0,1.2) = (3, -1.2) \).
  5. Optimal Step: We minimize \( \phi(\gamma) = f(x_1 + \gamma d_1) = f(3\gamma, 1.2 - 1.2\gamma) \).
    \( \phi(\gamma) = (3\gamma-1)^2 + (1.2-1.2\gamma-1.2)^2 = (3\gamma-1)^2 + (-1.2\gamma)^2 \).
    \( \phi'(\gamma) = 2 \times 3 \times (3\gamma-1) + 2 \times (-1.2) \times (-1.2\gamma) = 18\gamma - 6 + 2.88\gamma = 20.88\gamma - 6 \).
    We solve \( \phi'(\gamma) = 0 \Rightarrow \gamma = 6 / 20.88 \approx 0.287 \).
    The optimal step is \( \gamma_1 \approx 0.287 \).
  6. Update: \( x_2 = (0, 1.2) + 0.287 \times (3, -1.2) \approx (0.86, 0.85) \).

The algorithm will continue in this manner, getting closer to the optimal solution with each iteration.



آخر تعديل: الأحد، 2 نوفمبر 2025، 10:53 PM