Part II: The Frank-Wolfe Algorithm

1. Objective and General Idea

🎯 The Objective

The Frank-Wolfe method (also known as the conditional gradient method) is used to solve optimization problems of the form:

$$ \min_{x \in D} f(x) $$

Where:

$f(x)$ is a convex and differentiable function (its graph is "bowl-shaped").
$D$ is a convex and compact constraint set (a region with no holes, like a polygon or a disk).

🧭 The General Idea: Moving Towards Vertices

Unlike standard gradient descent methods, which can "step outside" the allowed region and require a "projection" step to get back in, Frank-Wolfe is smarter. The algorithm is guaranteed to stay within the feasible set $D$ at all times.

The Frank-Wolfe Trick

At each step, instead of blindly following the descent direction (the gradient), the algorithm asks a simple question:

"In the direction opposite to the gradient (the best local descent direction), what is the most extreme point in my entire allowed region $D$?"

The answer is almost always a vertex of the constraint polygon. The algorithm then takes a small step in the direction of that vertex. It transforms a difficult problem into a series of very simple linear problems.

2. The Steps of the Algorithm

Step 1: Choose a Feasible Starting Point

Choose a point $x_0$ that respects the constraints ($x_0 \in D$). A vertex of the feasible set is often a good choice.

Step 2: Calculate the Gradient

Compute the gradient $\nabla f(x_k)$ at the current point. This vector points in the direction where the function increases the fastest. To minimize, we will go in the opposite direction, $-\nabla f(x_k)$.

Step 3: Solve the Auxiliary Linear Problem (LMO)

This is the key step, the Linear Minimization Oracle (LMO). We solve:

$$ s_k = \arg\min_{s \in D} \langle s, \nabla f(x_k) \rangle $$

This consists of finding the point $s_k$ in the domain $D$ that is "furthest" in the direction opposite to the gradient. This is an easy linear problem to solve: we just need to test the vertices of $D$.

Step 4: Determine the Direction of Movement

The direction is simply the vector connecting the current point $x_k$ to the target vertex $s_k$.

$$ d_k = s_k - x_k $$

Step 5: Find the Optimal Step Size $\gamma_k$

We look for the best step size $\gamma_k \in [0, 1]$ that minimizes $f$ along the line segment between $x_k$ and $s_k$. We therefore solve a minimization problem in a single variable:

$$ \gamma_k = \arg\min_{\gamma \in [0, 1]} f(x_k + \gamma d_k) $$

(Several methods for choosing $\gamma$ exist, see Section 3).

Step 6: Update the Point

We move by a step of size $\gamma_k$ in the direction $d_k$.

$$ x_{k+1} = x_k + \gamma_k d_k $$

Step 7: Check for Convergence

We stop if the point barely moves ($ \|x_{k+1} - x_k\| < \varepsilon $) or if the function value stagnates. Otherwise, we return to Step 2.

3. Focus: How to Choose the Step Size $\gamma$?

The choice of step size $\gamma$ is crucial for the algorithm's performance. Here are the most common methods:

1️⃣ Exact Line Search

This is the most precise method. We analytically solve the minimization problem for $\gamma$ by differentiating the function $ \phi(\gamma) = f(x_k + \gamma d_k) $ and finding where the derivative is zero ($\phi'(\gamma) = 0$).

Advantage: Fastest convergence in terms of number of iterations.
Disadvantage: Can be computationally expensive if the derivative is complex.

2️⃣ Diminishing Step Size

We use a predefined sequence that guarantees convergence. The standard choice is:

$$ \gamma_k = \frac{2}{k+2} $$

The sequence is thus: $1, 2/3, 1/2, 2/5, \dots$. The steps become smaller and smaller.

Advantage: Very simple to implement, guaranteed convergence.
Disadvantage: Can be slower in practice than exact line search.

3️⃣ Fixed Step Size

We choose a constant value, for example $\gamma_k = 0.1$. This is the simplest method.

Advantage: Extremely simple, no extra computation.
Disadvantage: Risk of slow convergence or oscillations if the step size is poorly chosen.

4. Detailed Complete Example

Let's solve the following problem:

$$ \min f(x, y) = (x-1)^2 + (y-1.2)^2 $$

Subject to the constraints:

$$ x+y \le 3 $$ $$ x \ge 0, y \ge 0 $$

The vertices of the feasible region $D$ are (0,0), (3,0), and (0,3).

Iteration 0 (k=0)

Initial Point: We choose $ x_0 = (0, 0) $.
Gradient: $ \nabla f(x,y) = [2(x-1), 2(y-1.2)] $.
At $x_0$, we have $ \nabla f(0,0) = [-2, -2.4] $.
LMO: We seek $ s_0 = \arg\min_{s \in D} \langle s, [-2, -2.4] \rangle $.
We need to minimize $-2x - 2.4y$. We test the vertices:
- (0,0): -2(0) - 2.4(0) = 0
- (3,0): -2(3) - 2.4(0) = -6
- (0,3): -2(0) - 2.4(3) = -7.2 (Minimum)
Therefore, $ s_0 = (0, 3) $.
Direction: $ d_0 = s_0 - x_0 = (0,3) - (0,0) = (0,3) $.
Optimal Step (Exact Line Search): We want to minimize $ \phi(\gamma) = f(x_0 + \gamma d_0) = f(0, 3\gamma) $.
$ \phi(\gamma) = (0-1)^2 + (3\gamma - 1.2)^2 = 1 + (3\gamma - 1.2)^2 $.
We differentiate: $ \phi'(\gamma) = 2 \times 3 \times (3\gamma - 1.2) = 18\gamma - 7.2 $.
We solve $ \phi'(\gamma) = 0 \Rightarrow 18\gamma = 7.2 \Rightarrow \gamma = 0.4 $.
The optimal step is $ \gamma_0 = 0.4 $.
Update: $ x_1 = x_0 + \gamma_0 d_0 = (0,0) + 0.4 \times (0,3) = (0, 1.2) $.

Iteration 1 (k=1)

Current Point: $ x_1 = (0, 1.2) $.
Gradient: $ \nabla f(0, 1.2) = [2(0-1), 2(1.2-1.2)] = [-2, 0] $.
LMO: We seek $ s_1 = \arg\min_{s \in D} \langle s, [-2, 0] \rangle $.
We need to minimize $-2x$. We test the vertices:
- (0,0): -2(0) = 0
- (3,0): -2(3) = -6 (Minimum)
- (0,3): -2(0) = 0
Therefore, $ s_1 = (3, 0) $.
Direction: $ d_1 = s_1 - x_1 = (3,0) - (0,1.2) = (3, -1.2) $.
Optimal Step: We minimize $ \phi(\gamma) = f(x_1 + \gamma d_1) = f(3\gamma, 1.2 - 1.2\gamma) $.
$ \phi(\gamma) = (3\gamma-1)^2 + (1.2-1.2\gamma-1.2)^2 = (3\gamma-1)^2 + (-1.2\gamma)^2 $.
$ \phi'(\gamma) = 2 \times 3 \times (3\gamma-1) + 2 \times (-1.2) \times (-1.2\gamma) = 18\gamma - 6 + 2.88\gamma = 20.88\gamma - 6 $.
We solve $ \phi'(\gamma) = 0 \Rightarrow \gamma = 6 / 20.88 \approx 0.287 $.
The optimal step is $ \gamma_1 \approx 0.287 $.
Update: $ x_2 = (0, 1.2) + 0.287 \times (3, -1.2) \approx (0.86, 0.85) $.

The algorithm will continue in this manner, getting closer to the optimal solution with each iteration.

آخر تعديل: الأحد، 2 نوفمبر 2025، 10:53 PM

Recherche opérationnelle
Part2: Frank-Wolfe Method

Part II: The Frank-Wolfe Algorithm

1. Objective and General Idea

🎯 The Objective

🧭 The General Idea: Moving Towards Vertices

The Frank-Wolfe Trick

2. The Steps of the Algorithm

Step 1: Choose a Feasible Starting Point

Step 2: Calculate the Gradient

Step 3: Solve the Auxiliary Linear Problem (LMO)

Step 4: Determine the Direction of Movement

Step 5: Find the Optimal Step Size \(\gamma_k\)

Step 6: Update the Point

Step 7: Check for Convergence

3. Focus: How to Choose the Step Size \(\gamma\)?

1️⃣ Exact Line Search

2️⃣ Diminishing Step Size

3️⃣ Fixed Step Size

4. Detailed Complete Example

Iteration 0 (k=0)

Iteration 1 (k=1)

Recherche opérationnelle Part2: Frank-Wolfe Method

Part II: The Frank-Wolfe Algorithm

1. Objective and General Idea

🎯 The Objective

🧭 The General Idea: Moving Towards Vertices

The Frank-Wolfe Trick

2. The Steps of the Algorithm

Step 1: Choose a Feasible Starting Point

Step 2: Calculate the Gradient

Step 3: Solve the Auxiliary Linear Problem (LMO)

Step 4: Determine the Direction of Movement

Step 5: Find the Optimal Step Size \(\gamma_k\)

Step 6: Update the Point

Step 7: Check for Convergence

3. Focus: How to Choose the Step Size \(\gamma\)?

1️⃣ Exact Line Search

2️⃣ Diminishing Step Size

3️⃣ Fixed Step Size

4. Detailed Complete Example

Iteration 0 (k=0)

Iteration 1 (k=1)

Recherche opérationnelle
Part2: Frank-Wolfe Method