Chapter V: Convex Optimization

Kelley's Cutting Plane Method

1. Introduction: What is Kelley's Method?

Kelley's Method, also known as the Cutting Plane Method, is a classic and powerful algorithm for minimizing a convex function over a convex set. It is particularly useful when the objective function \(f(x)\) is complex or treated as a "black box" where we can only evaluate its value and its gradient at specific points.

The problem we are trying to solve is:

$$ \min_{x \in X} f(x) $$

Where:

  • \(f(x)\) is a convex function.
  • \(X\) is a convex set (e.g., a polygon or polyhedron).

🧩 The Main Idea: Build an Approximation from Below

Imagine the function \(f(x)\) is a smooth, curved bowl. Instead of trying to analyze the entire bowl at once, Kelley's method builds a simpler model of it from underneath using straight lines (or planes).

  1. Start with a point: Pick a point and draw the tangent line to the bowl at that point. Because the bowl is convex, this tangent line will always lie entirely below it.
  2. Build a simple model: This single tangent line is our first, very rough approximation of the bowl.
  3. Find the minimum of the model: Finding the minimum of this simple linear model is easy. This gives us a new point to investigate.
  4. Add a new "cut": Go to this new point on the real bowl, and add a new tangent line there. Our model is now the "upper envelope" (the maximum) of all the tangent lines we've collected.
  5. Repeat: As we add more and more tangent lines (or "cutting planes"), our piecewise-linear approximation gets closer and closer to the true shape of the bowl, and the minimum of our model gets closer to the true minimum.

2. The Kelley Algorithm Step-by-Step

A key property of convex functions is that they are always above their tangent lines. The tangent line at a point \(x_k\) is a linear underestimator of \(f(x)\):

$$ f(x) \ge f(x_k) + \nabla f(x_k)^T (x - x_k) $$

The algorithm uses this property to build its model.

Step 0: Initialization

Pick a starting point \(x_0 \in X\). Evaluate the function value \(f(x_0)\) and the gradient \(g_0 = \nabla f(x_0)\). Initialize a set of cuts, \(C_0\), containing just this first tangent line information.

Step 1: Solve the Linear Subproblem

At iteration \(k\), we have a model \(\hat{f}_k(x)\) which is the maximum of all tangent lines collected so far. We find the next point, \(x_{k+1}\), by minimizing this model over the feasible set \(X\):

$$ x_{k+1} = \arg\min_{x \in X} \hat{f}_k(x) = \arg\min_{x \in X} \left( \max_{i \le k} \{ f(x_i) + g_i^T(x - x_i) \} \right) $$

This is equivalent to solving the following Linear Program:

minimize \(z\)
subject to: \( z \ge f(x_i) + g_i^T(x - x_i) \) for all \(i \le k\)
and \(x \in X\)

Step 2: Add a New Cut

Evaluate the true function \(f(x_{k+1})\) and its gradient \(g_{k+1} = \nabla f(x_{k+1})\) at the new point.

Add this new tangent line information to your set of cuts. The model for the next iteration, \(\hat{f}_{k+1}(x)\), will now be more accurate.

Step 3: Check for Convergence

The algorithm generates two sequences: \(U_k = \min_{i \le k} f(x_i)\) (the best true function value found so far, an **upper bound**) and \(L_k = f_k(x_{k+1})\) (the minimum of the model, a **lower bound**).

We stop when the gap between the upper and lower bounds is small enough: $$ U_k - L_k < \varepsilon $$ If the gap is not small enough, we increment \(k\) and return to Step 1.

3. A Simple 1D Example, Step-by-Step

Let's minimize the function \(f(x) = (x-2)^2 + 1\) on the interval \(X = [0, 4]\).

The derivative (gradient) is \(f'(x) = 2(x-2)\).

Iteration 1

  1. Initialize: Choose \(x_0 = 0\).
    • \(f(0) = (0-2)^2 + 1 = 5\)
    • \(g_0 = f'(0) = 2(0-2) = -4\)
  2. Subproblem: Our first model has only one cut (the tangent at \(x_0\)):
    \(\hat{f}_0(x) = f(x_0) + g_0(x - x_0) = 5 - 4(x - 0) = 5 - 4x\).
    We solve: \(\min_{x \in [0, 4]} (5 - 4x)\).
    This is a line with a negative slope, so the minimum occurs at the rightmost point of the interval.
    The solution is \(x_1 = 4\). The value of the model at this point is \(L_0 = 5 - 4(4) = -11\).
  3. Convergence Check: The best true value so far is \(U_0 = f(0) = 5\). The gap is \(U_0 - L_0 = 5 - (-11) = 16\). This is large, so we continue.

Iteration 2

  1. Add Cut: We need information at our new point \(x_1 = 4\).
    • \(f(4) = (4-2)^2 + 1 = 5\)
    • \(g_1 = f'(4) = 2(4-2) = 4\)
    The new tangent line is \(h_1(x) = f(4) + g_1(x-4) = 5 + 4(x-4) = 4x - 11\).
  2. Subproblem: Our model is now the maximum of two lines:
    \(\hat{f}_1(x) = \max(5 - 4x, 4x - 11)\).
    We want to find the minimum of this "V" shape on the interval [0, 4]. The minimum will be at the intersection of the two lines.
    Solve \(5 - 4x = 4x - 11 \Rightarrow 8x = 16 \Rightarrow x = 2\).
    The solution is \(x_2 = 2\). The value of the model at this point is \(L_1 = 5 - 4(2) = -3\).
  3. Convergence Check: The best true value so far is \(\min(f(0), f(4)) = \min(5, 5) = 5\). So, \(U_1 = 5\).
    The gap is \(U_1 - L_1 = 5 - (-3) = 8\). Still large, so we continue.

Iteration 3

  1. Add Cut: We need information at \(x_2 = 2\).
    • \(f(2) = (2-2)^2 + 1 = 1\)
    • \(g_2 = f'(2) = 2(2-2) = 0\)
    The new tangent line is \(h_2(x) = f(2) + g_2(x-2) = 1 + 0(x-2) = 1\). This is a horizontal line.
  2. Subproblem: Our model is now \(\hat{f}_2(x) = \max(5 - 4x, 4x - 11, 1)\).
    The minimum of this new, more accurate model is clearly 1, which occurs at \(x=2\).
    The solution is \(x_3 = 2\). The value of the model is \(L_2 = 1\).
  3. Convergence Check: The best true value so far is \(\min(f(0), f(4), f(2)) = \min(5, 5, 1) = 1\). So, \(U_2 = 1\).
    The gap is \(U_2 - L_2 = 1 - 1 = 0\). The gap is zero, so the algorithm has converged.
The optimal solution is x = 2, with an optimal value of f(x) = 1.

4. Kelley's Method vs. Frank-Wolfe

While both algorithms solve similar problems, their strategies are fundamentally different.

AspectFrank-WolfeKelley (Cutting Plane)
Core Idea Find a direction (towards a vertex) and take a small step. Build an increasingly accurate linear model of the function.
What it Linearizes The objective function at a single point to find a direction. The objective function at many points to build a global model.
How it Moves Moves the point \(x_k\) directly. The problem definition never changes. The point \(x_k\) is the solution to an expanding LP. The problem gets bigger each step.
Memory "Memoryless" - only needs the current point \(x_k\). Requires storing all previous gradients and points to build the model.
Modifié le: mardi 11 novembre 2025, 23:27