Chapter V: Convex Optimization

Kelley's Cutting Plane Method

1. Introduction: What is Kelley's Method?

Kelley's Method, also known as the Cutting Plane Method, is a classic and powerful algorithm for minimizing a convex function over a convex set. It is particularly useful when the objective function $f(x)$ is complex or treated as a "black box" where we can only evaluate its value and its gradient at specific points.

The problem we are trying to solve is:

$$ \min_{x \in X} f(x) $$

Where:

$f(x)$ is a convex function.
$X$ is a convex set (e.g., a polygon or polyhedron).

🧩 The Main Idea: Build an Approximation from Below

Imagine the function $f(x)$ is a smooth, curved bowl. Instead of trying to analyze the entire bowl at once, Kelley's method builds a simpler model of it from underneath using straight lines (or planes).

Start with a point: Pick a point and draw the tangent line to the bowl at that point. Because the bowl is convex, this tangent line will always lie entirely below it.
Build a simple model: This single tangent line is our first, very rough approximation of the bowl.
Find the minimum of the model: Finding the minimum of this simple linear model is easy. This gives us a new point to investigate.
Add a new "cut": Go to this new point on the real bowl, and add a new tangent line there. Our model is now the "upper envelope" (the maximum) of all the tangent lines we've collected.
Repeat: As we add more and more tangent lines (or "cutting planes"), our piecewise-linear approximation gets closer and closer to the true shape of the bowl, and the minimum of our model gets closer to the true minimum.

2. The Kelley Algorithm Step-by-Step

A key property of convex functions is that they are always above their tangent lines. The tangent line at a point $x_k$ is a linear underestimator of $f(x)$:

$$ f(x) \ge f(x_k) + \nabla f(x_k)^T (x - x_k) $$

The algorithm uses this property to build its model.

Step 0: Initialization

Pick a starting point $x_0 \in X$. Evaluate the function value $f(x_0)$ and the gradient $g_0 = \nabla f(x_0)$. Initialize a set of cuts, $C_0$, containing just this first tangent line information.

Step 1: Solve the Linear Subproblem

At iteration $k$, we have a model $\hat{f}_k(x)$ which is the maximum of all tangent lines collected so far. We find the next point, $x_{k+1}$, by minimizing this model over the feasible set $X$:

$$ x_{k+1} = \arg\min_{x \in X} \hat{f}_k(x) = \arg\min_{x \in X} \left( \max_{i \le k} \{ f(x_i) + g_i^T(x - x_i) \} \right) $$

This is equivalent to solving the following Linear Program:

minimize $z$
subject to: $ z \ge f(x_i) + g_i^T(x - x_i) $ for all $i \le k$
and $x \in X$

Step 2: Add a New Cut

Evaluate the true function $f(x_{k+1})$ and its gradient $g_{k+1} = \nabla f(x_{k+1})$ at the new point.

Add this new tangent line information to your set of cuts. The model for the next iteration, $\hat{f}_{k+1}(x)$, will now be more accurate.

Step 3: Check for Convergence

The algorithm generates two sequences: $U_k = \min_{i \le k} f(x_i)$ (the best true function value found so far, an **upper bound**) and $L_k = f_k(x_{k+1})$ (the minimum of the model, a **lower bound**).

We stop when the gap between the upper and lower bounds is small enough: $$ U_k - L_k < \varepsilon $$ If the gap is not small enough, we increment $k$ and return to Step 1.

3. A Simple 1D Example, Step-by-Step

Let's minimize the function $f(x) = (x-2)^2 + 1$ on the interval $X = [0, 4]$.

The derivative (gradient) is $f'(x) = 2(x-2)$.

Iteration 1

Initialize: Choose $x_0 = 0$.
- $f(0) = (0-2)^2 + 1 = 5$
- $g_0 = f'(0) = 2(0-2) = -4$
Subproblem: Our first model has only one cut (the tangent at $x_0$):
$\hat{f}_0(x) = f(x_0) + g_0(x - x_0) = 5 - 4(x - 0) = 5 - 4x$.
We solve: $\min_{x \in [0, 4]} (5 - 4x)$.
This is a line with a negative slope, so the minimum occurs at the rightmost point of the interval.
The solution is $x_1 = 4$. The value of the model at this point is $L_0 = 5 - 4(4) = -11$.
Convergence Check: The best true value so far is $U_0 = f(0) = 5$. The gap is $U_0 - L_0 = 5 - (-11) = 16$. This is large, so we continue.

Iteration 2

Add Cut: We need information at our new point $x_1 = 4$.
- $f(4) = (4-2)^2 + 1 = 5$
- $g_1 = f'(4) = 2(4-2) = 4$
The new tangent line is $h_1(x) = f(4) + g_1(x-4) = 5 + 4(x-4) = 4x - 11$.
Subproblem: Our model is now the maximum of two lines:
$\hat{f}_1(x) = \max(5 - 4x, 4x - 11)$.
We want to find the minimum of this "V" shape on the interval [0, 4]. The minimum will be at the intersection of the two lines.
Solve $5 - 4x = 4x - 11 \Rightarrow 8x = 16 \Rightarrow x = 2$.
The solution is $x_2 = 2$. The value of the model at this point is $L_1 = 5 - 4(2) = -3$.
Convergence Check: The best true value so far is $\min(f(0), f(4)) = \min(5, 5) = 5$. So, $U_1 = 5$.
The gap is $U_1 - L_1 = 5 - (-3) = 8$. Still large, so we continue.

Iteration 3

Add Cut: We need information at $x_2 = 2$.
- $f(2) = (2-2)^2 + 1 = 1$
- $g_2 = f'(2) = 2(2-2) = 0$
The new tangent line is $h_2(x) = f(2) + g_2(x-2) = 1 + 0(x-2) = 1$. This is a horizontal line.
Subproblem: Our model is now $\hat{f}_2(x) = \max(5 - 4x, 4x - 11, 1)$.
The minimum of this new, more accurate model is clearly 1, which occurs at $x=2$.
The solution is $x_3 = 2$. The value of the model is $L_2 = 1$.
Convergence Check: The best true value so far is $\min(f(0), f(4), f(2)) = \min(5, 5, 1) = 1$. So, $U_2 = 1$.
The gap is $U_2 - L_2 = 1 - 1 = 0$. The gap is zero, so the algorithm has converged.

The optimal solution is x = 2, with an optimal value of f(x) = 1.

4. Kelley's Method vs. Frank-Wolfe

While both algorithms solve similar problems, their strategies are fundamentally different.

Aspect	Frank-Wolfe	Kelley (Cutting Plane)
Core Idea	Find a direction (towards a vertex) and take a small step.	Build an increasingly accurate linear model of the function.
What it Linearizes	The objective function at a single point to find a direction.	The objective function at many points to build a global model.
How it Moves	Moves the point $x_k$ directly. The problem definition never changes.	The point $x_k$ is the solution to an expanding LP. The problem gets bigger each step.
Memory	"Memoryless" - only needs the current point $x_k$.	Requires storing all previous gradients and points to build the model.

Last modified: Tuesday, 11 November 2025, 11:27 PM

Recherche opérationnelle Part 3: Kelley's Cutting Plane Method