1. Objective and General Idea
🎯 The Objective
The Frank-Wolfe method (also known as the conditional gradient method) is used to solve optimization problems of the form:
$$ \min_{x \in D} f(x) $$
Where:
- \(f(x)\) is a convex and differentiable function (its graph is "bowl-shaped").
- \(D\) is a convex and compact constraint set (a region with no holes, like a polygon or a disk).
🧭 The General Idea: Moving Towards Vertices
Unlike standard gradient descent methods, which can "step outside" the allowed region and require a "projection" step to get back in, Frank-Wolfe is smarter. The algorithm is guaranteed to stay within the feasible set \(D\) at all times.
The Frank-Wolfe Trick
At each step, instead of blindly following the descent direction (the gradient), the algorithm asks a simple question:
"In the direction opposite to the gradient (the best local descent direction), what is the most extreme point in my entire allowed region \(D\)?"
The answer is almost always a vertex of the constraint polygon. The algorithm then takes a small step in the direction of that vertex. It transforms a difficult problem into a series of very simple linear problems.
2. The Steps of the Algorithm
Step 1: Choose a Feasible Starting Point
Choose a point \(x_0\) that respects the constraints (\(x_0 \in D\)). A vertex of the feasible set is often a good choice.
Step 2: Calculate the Gradient
Compute the gradient \(\nabla f(x_k)\) at the current point. This vector points in the direction where the function increases the fastest. To minimize, we will go in the opposite direction, \(-\nabla f(x_k)\).
Step 3: Solve the Auxiliary Linear Problem (LMO)
This is the key step, the Linear Minimization Oracle (LMO). We solve:
$$ s_k = \arg\min_{s \in D} \langle s, \nabla f(x_k) \rangle $$
This consists of finding the point \(s_k\) in the domain \(D\) that is "furthest" in the direction opposite to the gradient. This is an easy linear problem to solve: we just need to test the vertices of \(D\).
Step 4: Determine the Direction of Movement
The direction is simply the vector connecting the current point \(x_k\) to the target vertex \(s_k\).
$$ d_k = s_k - x_k $$
Step 5: Find the Optimal Step Size \(\gamma_k\)
We look for the best step size \(\gamma_k \in [0, 1]\) that minimizes \(f\) along the line segment between \(x_k\) and \(s_k\). We therefore solve a minimization problem in a single variable:
$$ \gamma_k = \arg\min_{\gamma \in [0, 1]} f(x_k + \gamma d_k) $$
(Several methods for choosing \(\gamma\) exist, see Section 3).
Step 6: Update the Point
We move by a step of size \(\gamma_k\) in the direction \(d_k\).
$$ x_{k+1} = x_k + \gamma_k d_k $$
Step 7: Check for Convergence
We stop if the point barely moves (\( \|x_{k+1} - x_k\| < \varepsilon \)) or if the function value stagnates. Otherwise, we return to Step 2.
3. Focus: How to Choose the Step Size \(\gamma\)?
The choice of step size \(\gamma\) is crucial for the algorithm's performance. Here are the most common methods:
1️⃣ Exact Line Search
This is the most precise method. We analytically solve the minimization problem for \(\gamma\) by differentiating the function \( \phi(\gamma) = f(x_k + \gamma d_k) \) and finding where the derivative is zero (\(\phi'(\gamma) = 0\)).
- Advantage: Fastest convergence in terms of number of iterations.
- Disadvantage: Can be computationally expensive if the derivative is complex.
2️⃣ Diminishing Step Size
We use a predefined sequence that guarantees convergence. The standard choice is:
$$ \gamma_k = \frac{2}{k+2} $$
The sequence is thus: \(1, 2/3, 1/2, 2/5, \dots\). The steps become smaller and smaller.
- Advantage: Very simple to implement, guaranteed convergence.
- Disadvantage: Can be slower in practice than exact line search.
3️⃣ Fixed Step Size
We choose a constant value, for example \(\gamma_k = 0.1\). This is the simplest method.
- Advantage: Extremely simple, no extra computation.
- Disadvantage: Risk of slow convergence or oscillations if the step size is poorly chosen.
4. Detailed Complete Example
Let's solve the following problem:
Subject to the constraints:
$$ x+y \le 3 $$ $$ x \ge 0, y \ge 0 $$The vertices of the feasible region \(D\) are (0,0), (3,0), and (0,3).
Iteration 0 (k=0)
- Initial Point: We choose \( x_0 = (0, 0) \).
- Gradient: \( \nabla f(x,y) = [2(x-1), 2(y-1.2)] \).
At \(x_0\), we have \( \nabla f(0,0) = [-2, -2.4] \). - LMO: We seek \( s_0 = \arg\min_{s \in D} \langle s, [-2, -2.4] \rangle \).
We need to minimize \(-2x - 2.4y\). We test the vertices:- (0,0): -2(0) - 2.4(0) = 0
- (3,0): -2(3) - 2.4(0) = -6
- (0,3): -2(0) - 2.4(3) = -7.2 (Minimum)
- Direction: \( d_0 = s_0 - x_0 = (0,3) - (0,0) = (0,3) \).
- Optimal Step (Exact Line Search): We want to minimize \( \phi(\gamma) = f(x_0 + \gamma d_0) = f(0, 3\gamma) \).
\( \phi(\gamma) = (0-1)^2 + (3\gamma - 1.2)^2 = 1 + (3\gamma - 1.2)^2 \).
We differentiate: \( \phi'(\gamma) = 2 \times 3 \times (3\gamma - 1.2) = 18\gamma - 7.2 \).
We solve \( \phi'(\gamma) = 0 \Rightarrow 18\gamma = 7.2 \Rightarrow \gamma = 0.4 \).
The optimal step is \( \gamma_0 = 0.4 \). - Update: \( x_1 = x_0 + \gamma_0 d_0 = (0,0) + 0.4 \times (0,3) = (0, 1.2) \).
Iteration 1 (k=1)
- Current Point: \( x_1 = (0, 1.2) \).
- Gradient: \( \nabla f(0, 1.2) = [2(0-1), 2(1.2-1.2)] = [-2, 0] \).
- LMO: We seek \( s_1 = \arg\min_{s \in D} \langle s, [-2, 0] \rangle \).
We need to minimize \(-2x\). We test the vertices:- (0,0): -2(0) = 0
- (3,0): -2(3) = -6 (Minimum)
- (0,3): -2(0) = 0
- Direction: \( d_1 = s_1 - x_1 = (3,0) - (0,1.2) = (3, -1.2) \).
- Optimal Step: We minimize \( \phi(\gamma) = f(x_1 + \gamma d_1) = f(3\gamma, 1.2 - 1.2\gamma) \).
\( \phi(\gamma) = (3\gamma-1)^2 + (1.2-1.2\gamma-1.2)^2 = (3\gamma-1)^2 + (-1.2\gamma)^2 \).
\( \phi'(\gamma) = 2 \times 3 \times (3\gamma-1) + 2 \times (-1.2) \times (-1.2\gamma) = 18\gamma - 6 + 2.88\gamma = 20.88\gamma - 6 \).
We solve \( \phi'(\gamma) = 0 \Rightarrow \gamma = 6 / 20.88 \approx 0.287 \).
The optimal step is \( \gamma_1 \approx 0.287 \). - Update: \( x_2 = (0, 1.2) + 0.287 \times (3, -1.2) \approx (0.86, 0.85) \).
The algorithm will continue in this manner, getting closer to the optimal solution with each iteration.