1. Introduction: What is Kelley's Method?
Kelley's Method, also known as the Cutting Plane Method, is a classic and powerful algorithm for minimizing a convex function over a convex set. It is particularly useful when the objective function \(f(x)\) is complex or treated as a "black box" where we can only evaluate its value and its gradient at specific points.
The problem we are trying to solve is:
$$ \min_{x \in X} f(x) $$
Where:
- \(f(x)\) is a convex function.
- \(X\) is a convex set (e.g., a polygon or polyhedron).
🧩 The Main Idea: Build an Approximation from Below
Imagine the function \(f(x)\) is a smooth, curved bowl. Instead of trying to analyze the entire bowl at once, Kelley's method builds a simpler model of it from underneath using straight lines (or planes).
- Start with a point: Pick a point and draw the tangent line to the bowl at that point. Because the bowl is convex, this tangent line will always lie entirely below it.
- Build a simple model: This single tangent line is our first, very rough approximation of the bowl.
- Find the minimum of the model: Finding the minimum of this simple linear model is easy. This gives us a new point to investigate.
- Add a new "cut": Go to this new point on the real bowl, and add a new tangent line there. Our model is now the "upper envelope" (the maximum) of all the tangent lines we've collected.
- Repeat: As we add more and more tangent lines (or "cutting planes"), our piecewise-linear approximation gets closer and closer to the true shape of the bowl, and the minimum of our model gets closer to the true minimum.
2. The Kelley Algorithm Step-by-Step
A key property of convex functions is that they are always above their tangent lines. The tangent line at a point \(x_k\) is a linear underestimator of \(f(x)\):
$$ f(x) \ge f(x_k) + \nabla f(x_k)^T (x - x_k) $$
The algorithm uses this property to build its model.
Step 0: Initialization
Pick a starting point \(x_0 \in X\). Evaluate the function value \(f(x_0)\) and the gradient \(g_0 = \nabla f(x_0)\). Initialize a set of cuts, \(C_0\), containing just this first tangent line information.
Step 1: Solve the Linear Subproblem
At iteration \(k\), we have a model \(\hat{f}_k(x)\) which is the maximum of all tangent lines collected so far. We find the next point, \(x_{k+1}\), by minimizing this model over the feasible set \(X\):
$$ x_{k+1} = \arg\min_{x \in X} \hat{f}_k(x) = \arg\min_{x \in X} \left( \max_{i \le k} \{ f(x_i) + g_i^T(x - x_i) \} \right) $$
This is equivalent to solving the following Linear Program:
minimize \(z\)
subject to: \( z \ge f(x_i) + g_i^T(x - x_i) \) for all \(i \le k\)
and \(x \in X\)
Step 2: Add a New Cut
Evaluate the true function \(f(x_{k+1})\) and its gradient \(g_{k+1} = \nabla f(x_{k+1})\) at the new point.
Add this new tangent line information to your set of cuts. The model for the next iteration, \(\hat{f}_{k+1}(x)\), will now be more accurate.
Step 3: Check for Convergence
The algorithm generates two sequences: \(U_k = \min_{i \le k} f(x_i)\) (the best true function value found so far, an **upper bound**) and \(L_k = f_k(x_{k+1})\) (the minimum of the model, a **lower bound**).
We stop when the gap between the upper and lower bounds is small enough: $$ U_k - L_k < \varepsilon $$ If the gap is not small enough, we increment \(k\) and return to Step 1.
3. A Simple 1D Example, Step-by-Step
Let's minimize the function \(f(x) = (x-2)^2 + 1\) on the interval \(X = [0, 4]\).
The derivative (gradient) is \(f'(x) = 2(x-2)\).
Iteration 1
- Initialize: Choose \(x_0 = 0\).
- \(f(0) = (0-2)^2 + 1 = 5\)
- \(g_0 = f'(0) = 2(0-2) = -4\)
- Subproblem: Our first model has only one cut (the tangent at \(x_0\)):
\(\hat{f}_0(x) = f(x_0) + g_0(x - x_0) = 5 - 4(x - 0) = 5 - 4x\).
We solve: \(\min_{x \in [0, 4]} (5 - 4x)\).
This is a line with a negative slope, so the minimum occurs at the rightmost point of the interval.
The solution is \(x_1 = 4\). The value of the model at this point is \(L_0 = 5 - 4(4) = -11\). - Convergence Check: The best true value so far is \(U_0 = f(0) = 5\). The gap is \(U_0 - L_0 = 5 - (-11) = 16\). This is large, so we continue.
Iteration 2
- Add Cut: We need information at our new point \(x_1 = 4\).
- \(f(4) = (4-2)^2 + 1 = 5\)
- \(g_1 = f'(4) = 2(4-2) = 4\)
- Subproblem: Our model is now the maximum of two lines:
\(\hat{f}_1(x) = \max(5 - 4x, 4x - 11)\).
We want to find the minimum of this "V" shape on the interval [0, 4]. The minimum will be at the intersection of the two lines.
Solve \(5 - 4x = 4x - 11 \Rightarrow 8x = 16 \Rightarrow x = 2\).
The solution is \(x_2 = 2\). The value of the model at this point is \(L_1 = 5 - 4(2) = -3\). - Convergence Check: The best true value so far is \(\min(f(0), f(4)) = \min(5, 5) = 5\). So, \(U_1 = 5\).
The gap is \(U_1 - L_1 = 5 - (-3) = 8\). Still large, so we continue.
Iteration 3
- Add Cut: We need information at \(x_2 = 2\).
- \(f(2) = (2-2)^2 + 1 = 1\)
- \(g_2 = f'(2) = 2(2-2) = 0\)
- Subproblem: Our model is now \(\hat{f}_2(x) = \max(5 - 4x, 4x - 11, 1)\).
The minimum of this new, more accurate model is clearly 1, which occurs at \(x=2\).
The solution is \(x_3 = 2\). The value of the model is \(L_2 = 1\). - Convergence Check: The best true value so far is \(\min(f(0), f(4), f(2)) = \min(5, 5, 1) = 1\). So, \(U_2 = 1\).
The gap is \(U_2 - L_2 = 1 - 1 = 0\). The gap is zero, so the algorithm has converged.
4. Kelley's Method vs. Frank-Wolfe
While both algorithms solve similar problems, their strategies are fundamentally different.
| Aspect | Frank-Wolfe | Kelley (Cutting Plane) |
|---|---|---|
| Core Idea | Find a direction (towards a vertex) and take a small step. | Build an increasingly accurate linear model of the function. |
| What it Linearizes | The objective function at a single point to find a direction. | The objective function at many points to build a global model. |
| How it Moves | Moves the point \(x_k\) directly. The problem definition never changes. | The point \(x_k\) is the solution to an expanding LP. The problem gets bigger each step. |
| Memory | "Memoryless" - only needs the current point \(x_k\). | Requires storing all previous gradients and points to build the model. |