From classical heuristics to AI-assisted algorithm design
Classical algorithms guarantee optimal solutions through exhaustive search or mathematical proof. However, many real-world problems belong to the NP-Hard class, where the solution space grows exponentially:
The chart below shows how exact algorithms (exponential) quickly become infeasible while heuristics (polynomial) scale gracefully:
Exact methods fail at scale (n>30), while heuristics remain practical even for n=1000+
Definition: A practical technique that quickly finds a good enough solution using problem-specific knowledge.
Characteristics:
Definition: A general-purpose framework that guides subordinate heuristics to escape local optima and find good solutions across diverse problem types.
Characteristics:
Landscape: Single global minimum
Best Methods: Gradient descent, Newton's method
Guarantee: Global optimum found
Example: Least squares regression
Landscape: Multiple local minima
Best Methods: PSO, GA, SA
Guarantee: Good approximation likely
Example: Neural network training
Landscape: Disconnected feasible regions
Best Methods: GA, ACO, Tabu Search
Guarantee: Near-optimal after tuning
Example: TSP, scheduling
Guarantee: Proven global optimum
Time: Often exponential in problem size
Use when: Small instances (n<50), optimality critical
Examples: Branch & Bound, Dynamic Programming
Guarantee: Within factor C of optimal (e.g., C=2)
Time: Polynomial in problem size
Use when: Need theoretical bounds + reasonable speed
Examples: Christofides TSP (1.5x), MST-based approaches
Guarantee: None, but often near-optimal in practice
Time: Controlled by user (usually seconds to hours)
Use when: Hard problems, time-constrained, quality vs speed trade-off
Examples: GA, PSO, SA, ACO, hybrids
Make locally optimal choices at each step, hoping the sequence leads to a good global solution. No backtracking.
Algorithm:
Complexity: O(nยฒ) - Fast!
Quality: Often 20-50% worse than optimal
Advantage: Quick, provides decent initial solution
Disadvantage: Local decisions can lead to poor global outcome
Start from an initial solution and explore its neighborhood to find improvements. Stop when no better neighbors exist (local optimum).
Neighborhood Definition: A set of solutions reachable via small modifications (e.g., swap two cities in TSP).
Core Steps:
Common Neighborhood Moves:
Advantage: Simple, fast, and easy to implement
Neighborhood Definition: Swap two edges in the tour
Current tour: 1โ2โ3โ4โ5โ1
Neighbor (2-opt swap edges 2-3 and 4-5): 1โ2โ4โ3โ5โ1
Hill Climbing Procedure:
Quality: Often 5-15% worse than optimal
Problem: Stops at local optima, can't escape!
Red line (pure local search) gets stuck. Green line (metaheuristic) escapes.
Run local search from multiple random starting solutions. Keep best.
Formula: n_starts ร O(local_search_time)
Simple but effective for small time budgets.
When stuck at local optimum, restart with random perturbation of current solution.
Balance: How much noise? When to restart?
Foundation for later metaheuristics like SA.
๐ฏ Key Insight: Metaheuristics extend classical heuristics by adding strategic escape mechanisms (probabilistic acceptance, population diversity, memory structures) to navigate away from local optima while maintaining computational efficiency.
Molten metal at high temperature has atoms in random positions. As temperature cools slowly, atoms settle into low-energy crystalline structure. If cooled too fast, atoms freeze in poor arrangement (defects).
Simulated Annealing is a probabilistic local search algorithm inspired by the physics of metallurgical annealing cooling a hot metal slowly to reach a crystalline (low-energy) state. It escapes local optima by occasionally accepting worse solutions, especially early when "temperature" is high.
Initialize: solution x โ random, T โ Tโ (high)
Start with a random solution and iteratively :
Repeat until T โ 0:
Geometric:
Tk = ฮฑยทTk-1
ฮฑ โ 0.95
Fast, common, good balance
Linear:
Tk = T0 - ฮฒยทk
ฮฒ โ 0.01
Slow, more exploration, theoretical
Logarithmic:
Tk = T0/ln(k+1)
Optimal convergence, impractically slow
Maintain a memory of visited solutions and forbid recent moves (tabu). Periodically release oldest restrictions (aspiration).
Initialize: solution x โ best_greedy, tabu_list โ empty
Repeat for max_iterations:
Function: Prevent cycling back to recently visited solutions
Implementation: Tabu list of size 7-20
Effect: Forces exploration away from local region
Function: Intensify search around good regions, diversify if stuck
Implementation: Frequency matrix of visited moves
Effect: Penalize frequently-visited solutions, encourage new areas
| Aspect | Simulated Annealing | Tabu Search |
|---|---|---|
| Escape Mechanism | Probabilistic (temperature) | Deterministic (memory) |
| Memory | None (stateless) | Tabu list + frequencies |
| Forced Moves | Only probabilistic | Forced to best non-tabu neighbor |
| Parameters | Tโ, ฮฑ (few) | Tabu tenure, aspiration (many) |
| Convergence | Theoretically guaranteed | No theoretical guarantee |
| Best For | Continuous, non-convex | Discrete, combinatorial |
| Implementation | Simple | More complex |
Genetic Algorithms (GAs) are population-based metaheuristics inspired by natural selection and genetics. They maintain a population of candidate solutions (individuals) that evolve over generations through selection, crossover, and mutation to explore the solution space.
GAs are particularly effective for complex optimization problems with large, multimodal search spaces, such as combinatorial optimization (e.g., TSP) and continuous function optimization.
Initialize: population P of size N with random individuals
Repeat for max_generations:
Example: [1,0,1,1,0,1]
Use: Boolean features, subsets, knapsack 0-1 selection
Pro: Simple ops | Con: Hamming distance bias
Example: [3.14, -2.7, 0.5]
Use: Continuous optimization, function minimization
Pro: Natural for reals | Con: Precision issues
Example: [3,1,4,2,5] (TSP tour)
Use: Ordering, sequencing, scheduling
Pro: Problem-specific | Con: Complex operators
Probability: P(select i) = f(i) / ฮฃf
Assign each individual a slot on spinning wheel proportional to fitness. Spin n times.
Pro: Simple, unbiased pressure
Con: Variance, stagnation with similar fitness
Procedure: Randomly pick k individuals, select best
Repeat n times to form parent population.
Pro: Low variance, control via k
Con: Elitist bias, slower diversity loss
Parent1: 1|0 1 1 0 1
Parent2: 0|1 0 0 1 0
Child1: 1|1 0 0 1 0
Child2: 0|0 1 1 0 1
Simple, fast, may disrupt good patterns
Parent1: 1 0|1 1 0 1|0
Parent2: 0 1|0 0 1 0|1
Child1: 1 0|0 0 1 0|0
Child2: 0 1|1 1 0 1|1
Better recombination, preserves ends
Mask: 1 0 1 0 1 0 1
Parent1: 1 0 1 1 0 1 0
Parent2: 0 1 0 0 1 0 1
Child1: 1 1 1 0 0 0 0
Most disruptive, maximum mixing
Binary: Flip each bit with probability p_m
Formula: x'_i = 1 - x_i if rand() < p_m
Typically p_m = 1/n where n = chromosome length
Real-valued: Add Gaussian noise
Formula: x'_i = x_i + N(0, ฯยฒ)
Typically ฯ = 0.01-0.1 ร domain range
GA convergence showing best fitness, average fitness, and population diversity (gap = worst - best)
Main Innovation: Mutation step size ฯ evolves with the population. Mutations that produce good offspring increase in magnitude (1/5 success rule).
Notation:
CMA-ES (Covariance Matrix Adaptation):
Adapts full covariance matrix of mutation distribution. Learns which coordinate directions correlate with fitness improvement.
Each particle (potential solution) moves through search space. Movement influenced by:
Inertia Weight w
Start: w = 0.9 (explore)
End: w = 0.4 (exploit)
Linear decay over iterations
Cognitive c1
c1 = 2.0
Particle trust in own experience
Higher โ more personal search
Social c2
c2 = 2.0
Particle attraction to best
Higher โ faster convergence
Ants leave pheromone trails on paths. Future ants prefer paths with more pheromone. Stigmergy: Indirect communication through environment.
Transition Probability (ant at city i choosing next city j):
Pheromone Update:
After all ants complete tour:
Local pheromone update during tour construction reduces premature stagnation.
Pseudo-random rule: Choose by probability or exploit best
Bound pheromone: ฯ_min โค ฯ โค ฯ_max. Maintain diversity better.
Only best ant deposits pheromone per iteration
PSO excels at continuous, ACO at discrete, GA balanced across problem types
Roles: Scout (explore), forager (exploit), onlooker (selective)
Balance: Shared info about food sources
Good for unconstrained continuous
Principle: Brighter fireflies attract dimmer ones
Attraction: Decreases with distance (light absorption)
Unique distance-based attraction mechanism
Sensing: Each glowworm has local region of visibility
Movement: Toward glowworms with higher luciferin
Natural multimodal capability
No single algorithm excels at all aspects. Combine strengths: Population diversity (GA) + Local exploitation (local search) = Better results.
Structure: GA + Local Search
Tuning Decision: When to apply local search?
Intensive MA: All offspring
Pro: High-quality solutions
Con: Expensive computation
Selective MA: Top 20%
Pro: Good trade-off
Con: Tuning required
Idea: Escape local optima by perturbation, re-optimize
Perturbation Strength: Key parameter
Fixed parameters rarely optimal across different problem instances or search phases. Adaptive methods adjust parameters based on search progress.
Principle: External algorithm adjusts parameters
Example: Crossover rate increases if diversity drops below threshold
Requires monitoring and decision rules
Principle: Parameters encoded in chromosome, evolve with solution
Example: ES strategy parameters mutate alongside x
CMA-ES: Full covariance matrix adaptation
Green front = Pareto optimal (non-dominated). Blue points = dominated (suboptimal in all objectives).
Key Innovation: Sort population by non-domination rank, maintain diversity
When: 0-1 decisions, feature selection, subset problems
Example TSP: Not ideal (permutation better)
Example Knapsack: Perfect (include/exclude items)
Gray coding: Reduces Hamming cliff distances
When: Ordering matters, TSP, scheduling, assignment
Example: [3,1,4,2,5] = city visit order
Operators: 2-opt, 3-opt, insertion, swap
Must preserve feasibility (valid tour)
f(x) = f_obj(x) + ฮปยทฮฃ(constraint_violations)
Pro: Simple, handles all constraints uniformly
Con: Tuning ฮป difficult
Fix infeasible solution through local procedure
Example: TSP tour with duplicate cities โ remove duplicates
Pro: Maintains feasibility | Con: Problem-specific
Ruggedness: Many/few local optima? Use fitness autocorrelation.
Causality: Small genome changes โ Small fitness changes? Analyze mutation neighborhood.
Modality: Number of basins of attraction. Affects algorithm choice.
Benchmark Instance: Euclidean TSP with n=100 cities
| Algorithm | Solution Quality (%OPT) | Runtime | Parameter Tuning | Notes |
|---|---|---|---|---|
| Nearest Neighbor | 120-150% | <1ms | None | Baseline, quick |
| 2-opt Local Search | 105-115% | 100ms | Stop criteria | Often from NN start |
| Simulated Annealing | 102-108% | 500ms | Tโ, ฮฑ, schedule | Tuning matters! |
| Genetic Algorithm | 103-110% | 800ms | Pop, Px, Pm | Population diversity key |
| Ant Colony | 101-105% | 1000ms | ฯ_0, ฮฑ, ฮฒ, ฯ | Best for discrete |
| Memetic (GA+2opt) | 100-102% | 2000ms | Hybrid params | Best overall |
Key Insights:
Convergence patterns: PSO fast initial, GA consistent, SA explores then exploits, Tabu gradual improvement
Problem: Deliver to customers using minimum vehicles, respecting time windows
Constraints: Vehicle capacity, time windows [earliest_i, latest_i], time needed per delivery
Lessons:
Metaheuristics have many parameters and design choices. Can machine learning automate algorithm selection and tuning?
Algorithm that selects which heuristic to use at each iteration.
Approach: Learn from performance history which operator works best now
Meta-level learning across problem instances
State: Current solution quality, diversity metrics
Actions: Select mutation, crossover, restart, etc.
Learn reward function: improvement per computational cost
Vision: Given problem features, predict best algorithm + parameters
Emerging Tools:
Concept: Quantum computers exploit superposition to explore solution space
Current: D-Wave quantum annealers for specific Ising problems
Opportunity: Map optimization problems to QUBO (quadratic unconstrained binary)
Still experimental, limited problem classes
Concept: Biologically-inspired hardware (spiking neural networks)
Current: Intel Loihi, IBM TrueNorth event-driven processors
Advantage: Ultra-low power, massive parallelism
Challenge: Mapping optimization algorithms to spiking paradigm
Focus: Genetic algorithms, evolutionary strategies
Features: Easy representation, crossover/mutation operators
pip install deap
Focus: Multi-objective optimization, many algorithms
Features: GA, PSO, DE, NSGA-II pre-implemented
pip install pygmo
Focus: MOO algorithms, visualization
Features: NSGA-II, MOEA/D, Pareto plotting
pip install platypus-opt
TSPLIB: Traveling Salesman Problem instances - 150+ problems, known optimal solutions
BBOB (Black-Box Optimization Benchmarking): 24 continuous test functions, performance metrics
CEC Benchmarks: Congress on Evolutionary Computation test suites (annual updates)
Kaggle Competitions: Real-world optimization problems with public leaderboards
competitions
Sections 1-3: Foundation of why optimization is hard, classical approaches, single-trajectory escape mechanisms
Sections 4-5: Population-based and collective intelligence methods, genetic and swarm algorithms
Sections 6-7: Modern enhancements, problem encoding, constraint handling, tuning methodology
Sections 8-10: Practical applications, real benchmarks, future research directions