17 Optimization

Yun-Tien Lee and Alec Loudenback

Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. - John Tukey, 1962

17.1 Chapter Overview

Optimization as root finding or minimization/maximization of defined objectives. Differentiable programming and the benefits to optimization problems. Other non-gradient based optimization approaches. Model fitting as an optimization problem.

17.2 Introduction

Local and global optimization: A local optimum value refers to a solution where the objective function (or cost function) has the best possible value in a neighborhood surrounding that solution. A global optimum value, on the other hand, is the best possible value of the objective function across the entire feasible domain. For smooth and convex functions, the gradient points towards the global minimum (or maximum), making it extremely efficient for finding the optimal solution. Even for non-convex functions, the gradient provides valuable information about the direction to move towards improving the objective function value locally.

using CairoMakie
let

    f(x) = x^8 - 3x^4 + x

    # Generate a range of x and y values
    xs = range(-1.5, 1.5, length=300)
    ys = f.(xs)

    fig = Figure(resolution=(600, 400))
    ax = Axis(fig[1, 1],
        xlabel="x",
        ylabel="f(x)",
        title="Function with Local and Global Optima",
        limits=(-1.5, 1.5, -5, 3)
    )

    lines!(ax, xs, ys, color=:blue)

    x_global_max = -1.1226

    scatter!(ax,
        [x_global_max],
        [f(x_global_max)],
        color=:red,
        markersize=10,
        label="Global Minimum"
    )

    x_local_min = [1.08835]

    scatter!(ax,
        x_local_min,
        f.(x_local_min),
        color=:black,
        markersize=10,
        label="Local Minimum"
    )

    # Add a legend
    axislegend(ax, position=:ct)

    fig
end

┌ Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
└ @ Makie ~/.julia/packages/Makie/ux0Te/src/scenes.jl:238

Optimization techniques are incredibly important and find uses in many areas:

Machine learning relies on “training”, which is essentially optimizing parameters to match target data.
Determining parameters for market consistent models where the modeled price is fit to market-observed prices.
Minimizing a certain risk outcome by optimizing asset allocations
Maximizing risk-adjusted yield in a portfolio.

We will introduce some basic concepts and categories of optimization techniques in this chapter:

Objective and Loss Functions
Optimization Techniques
- Nonlinear Optimization
  - Gradient Based Techniques
  - Bracketing Methods
  - Other Techniques
- Linear Optimization

Below is an expanded version of the section that aims to be more thorough while keeping the writing concise and direct.

17.3 Objective and Loss Functions

Any optimization algorithm needs to know what is being optimized. We call this function the objective function. Usually, the objective function is something we want to minimize—though maximizing a function is mathematically equivalent to minimizing its negative. Concretely, if you want to maximize \(f(x)\), you can minimize \(-f(x)\). The optimal point (also known as the \(\arg\min\)) is simply the input value that produces the smallest objective value.

In many practical situations—especially those with noisy or uncertain data—the function we want to optimize is called a loss function. This term is common in statistical modeling and machine learning, where the aim is to measure how far a model’s predictions deviate from real observations. For instance, the following squared loss function computes the sum of squared errors between a model’s predictions and the targets:

loss(f, guess, target) = sum((f.(guess) .- target).^2)

Here, guess might represent model parameters (or input values), and target is the observed data. Minimizing this loss aligns the model’s predictions with reality as closely as possible.

17.4 Optimization Techniques

17.4.1 Non-Linear Optimization

Non-linear optimization refers to problems where the relationship between the inputs and outputs is not constrained to be a linear relationship, which is incredibly common in financial modeling.

In many financial settings, variables must obey certain bounds or relationships (e.g., portfolios cannot exceed a total budget, a particular risk measure must remain below a threshold). Handling constraints often involves methods like Lagrange multipliers, projected gradient, or analytic penalty functions. We won’t cover these in depth here, but be aware that most general optimization libraries can handle constraints in non-linear problems as well.

17.4.1.1 Gradient-Based Optimization

The most efficient non-linear optimization algorithms tend to utilize the gradient (i.e. multivariable derivative) in order to make the optimization substantially more efficient. The gradient is useful because it can tell you if you are at a maxima or minima (when the derivative is zero, the function is at a maxima or minima), but also because algorithms can be ‘smarter’ about searching for a solution using the additional information. The gradient provides the direction of the steepest ascent of a function. Optimization algorithms often iteratively update parameters in the direction opposite to the gradient (for minimization problems), which tends to converge towards a local minimum (or maximum for maximization problems). Besides, computing the gradient is often computationally feasible and relatively inexpensive compared to other methods for determining function behavior, such as higher-order derivatives or finite-difference methods. Beyond just the direction, the magnitude (or norm) of the gradient also indicates how steep the function change is in that direction. This information is used to adjust step sizes in optimization algorithms, balancing between convergence speed and stability.

Calculating gradients in the context of computer algorithms is discussed at length in Chapter 16, but a quick recap of the available approaches:

Finite differences: evaluate the function at nearby points and determine the rates of change associated with each change in direction.
Analytic derivatives: A human-derived or computer tool (such as Mathematica) is able to analytically determine a derivative that is coded into the optimization algorithm
Automatic differentiation (AD): elementary rules are applied to decompose elementary code operations into derivatives, allowing for very efficient computation of complex algorithms.

To compare the approaches, here is an example of determining the derivative of a simple function at a certain point:

using Zygote

# Define a differentiable function
f(x) = 3x^2 + 2x + 1
# Define an input value
x = 2.0

finite_diff = (f(x + 0.001) - f(x)) / 0.001

println("Value of f(x) at x=", f(x))
1println("Gradient of f(x) at x=", finite_diff)
2println("Gradient of f(x) at x=", gradient(f, x))
3println("Gradient of f(x) at x=", 6 * x + 2)

1: A finite-difference approach to determining a gradient. Note that this method required two function calls of f to calculate an approximation.
2: An automatic, exact derivative determined by the rules in the Zygote library.
3: An analytic derivative using a user-coded derivative.

Value of f(x) at x=17.0
Gradient of f(x) at x=14.002999999998877
Gradient of f(x) at x=(14.0,)
Gradient of f(x) at x=14.0

17.4.1.1.1 Root finding

Root finding, also known as root approximation or root isolation, is the process of finding the values of the independent variable (usually denoted as \(x\)) for which a given function equals zero. In mathematical terms, if we have a function \(f(x)\), root finding involves finding values of \(x\) such that \(f(x)=0\).

There are various algorithms for root finding, each with its own advantages and disadvantages depending on the characteristics of the function and the requirements of the problem. One notable approach is Newton’s method, an iterative method that uses the derivative or gradient of the function to approximate the root with increasing accuracy in each iteration.

We will again use a simple function to illustrate the process:

using Zygote

# Define a differentiable function
f(x) = 2x^2 - 3x + 1
# Define an initial value
x = 0.0
# tolerance of difference in value
tol = 1e-6
# maximum number of iteration of the algorithm
max_iter = 1000
iter = 0
while abs(f(x)) > tol && iter < max_iter
    x -= f(x) / gradient(f, x)[1]
    iter += 1
end
if iter == max_iter
    println("Warning: Maximum number of iterations reached.")
else
    println("Root found after ", iter, " iterations.")
end
print("Approximate root: ", x)

Root found after 5 iterations.
Approximate root: 0.4999999998835846

Although it might look different, root-finding (solving (g(x)=0)) can be cast as a minimization problem by defining an objective function such as ( (g(x))^2 ). In this view, driving ( (g(x))^2 ) to zero compels ( g(x) ) itself to be zero, so the methods and algorithmic ideas from minimization apply naturally to root-finding scenarios as well.

17.4.1.1.2 BFGS

BFGS—named for Broyden, Fletcher, Goldfarb, and Shanno—is a popular member of the quasi–Newton family of optimization algorithms. Although it does require first‐order gradient information, BFGS does not need the exact Hessian (i.e., second derivatives). Instead, it updates an approximation to the inverse Hessian at each step using the gradients from previous iterations. This extra curvature information allows BFGS to converge much faster than steepest descent in practice. Moreover, because it never explicitly forms the full Hessian matrix, it remains efficient for moderately sized problems. In finance and actuarial settings, BFGS can be especially useful for model calibration or parameter estimation tasks where one needs to handle nonlinear functions relatively quickly but cannot afford the computational overhead of second derivatives.

The following example uses the Optim package available in Julia to do the BFGS optimization. We don’t illustrate the complete algorithm as it is a bit longer, but wanted to ensure that this workhorse of an algorithm was mentioned.

using Optim

# Define the objective function to minimize
function objective_function(x)
    return sum(x .* x)
end

# Initial guess for the minimization
initial_x = [1.0]
# Perform optimization using BFGS method
result = optimize(objective_function, initial_x, BFGS())
# Extract the optimized solution
solution = result.minimizer
minimum_value = result.minimum

# Print the result
println("Optimized solution: x = ", solution)
println("Minimum value found: ", minimum_value)

Optimized solution: x = [-6.359357485052897e-13]
Minimum value found: 4.0441427622698303e-25

17.4.1.2 Gradient-Free Optimization

This category includes algorithms that do not rely on gradients or derivative information. They often explore the objective function using heuristics or other types of probes to guide the search.

17.4.1.2.1 Bracketing Methods

A bracketed search algorithm is a technique used in optimization and numerical methods to confine or “bracket” a minimum or maximum of a function within a specified interval. The primary goal is to reduce the search space systematically until a satisfactory solution or range containing the optimal value is found.

function bisection_method(f, a, b; tol=1e-6, max_iter=100)
    """
    Bisection method to find a root of the function f(x) within the interval [a, b].

    Parameters:
    - f: Function to find the root of.
    - a, b: Initial interval [a, b] where the root is expected to be.
    - tol: Tolerance for the root (default is 1e-6).
    - max_iter: Maximum number of iterations allowed (default is 100).

    Returns:
    - root: Approximate root found within the tolerance.
    - iterations: Number of iterations taken to converge.
    """
    fa = f(a)
    fb = f(b)
    if fa * fb > 0
        error("The function values at the endpoints must have opposite signs.")
    end
    iterations = 0
    while (b - a) / 2 > tol && iterations < max_iter
        c = (a + b) / 2
        fc = f(c)
        if fc == 0
            return c, iterations
        end
        if fa * fc < 0
            b = c
            fb = fc
        else
            a = c
            fa = fc
        end
        iterations += 1
    end
    root = (a + b) / 2
    return root, iterations
end

# Define the function we want to find the root of
function f(x)
    return x^3 - 6x^2 + 11x - 6.1
end

# Initial interval [a, b] and tolerance
a = 0.5
b = 10
tolerance = 1e-6
# Apply the bisection method
root, iterations = bisection_method(f, a, b, tol=tolerance)

# Print results
println("Approximate root: ", root)
println("Iterations taken: ", iterations)
println("Function value at root: ", f(root))

Approximate root: 3.046680122613907
Iterations taken: 23
Function value at root: -9.356632642010254e-7

A popular practical algorithm is called Brent’s Method, which uses additional heuristics to accelerate the optimization routhine in most cases.

17.4.1.3 Other Non-Gradient Based Optimization Techniques

17.4.1.3.1 Nelder-Mead simplex method

The Nelder-Mead simplex method is a popular optimization algorithm used for minimizing (or maximizing) nonlinear functions that are not necessarily differentiable. It’s particularly useful when gradient-based methods cannot be applied. It is often used in low-dimensional problems due to its simplicity and robustness. We will use the Rosenbrock function which can be useful in certain portfolio optimization problem to illustrate the process.

using Optim

# Define the Rosenbrock function
function rosenbrock(v)
    x, y = v[1], v[2]
    return (1 - x)^2 + 100 * (y - x^2)^2
end

# Initial guess for (x, y)
initial_guess = [-1.5, 2.0]

# Perform optimization using the Nelder-Mead method
result = optimize(rosenbrock, initial_guess, NelderMead())

# Extract results
optimal_point = Optim.minimizer(result)
minimum_value = Optim.minimum(result)

println("Optimal Point: ", optimal_point)
println("Minimum Value: ", minimum_value)

Optimal Point: [0.9999913430783984, 0.9999847519696222]
Minimum Value: 5.016695917763382e-10

17.4.1.3.2 Simulated annealing

Simulated Annealing (SA) is a probabilistic optimization technique inspired by the annealing process in metallurgy. It is used to find near-optimal solutions to optimization problems, particularly in cases where traditional gradient-based methods may get stuck in local minima/maxima. SA accepts worse solutions with a certain probability, allowing it to explore the search space more broadly initially and then gradually narrow down towards better solutions as it progresses. In this section the Rastrigin function is used to illustrate the process. The function can be useful for asset modeling.

using Random

Random.seed!(1234)

# Parameters
max_iterations = 1000            # Number of iterations
initial_temperature = 100.0      # Starting temperature
cooling_rate = 0.99              # Cooling rate (temperature multiplier)
bounds = (-5.12, 5.12)           # Bounds for the search space
dimension = 5                    # Number of dimensions in the search space

# Objective function: Rastrigin function
function rastrigin(x)
    A = 10
    n = length(x)
    return A * n + sum(xi^2 - A * cos(2 * π * xi) for xi in x)
end

# Random initialization within bounds
function initialize_solution()
    return rand(bounds[1]:0.01:bounds[2], dimension)
end

# Random perturbation within bounds
function perturb_solution(solution)
    perturbed = copy(solution)
    index = rand(1:dimension)
    perturb_amount = rand(-0.1:0.01:0.1)  # Small random change
    perturbed[index] += perturb_amount
    # Ensure perturbed solution is within bounds
    perturbed[index] = clamp(perturbed[index], bounds[1], bounds[2])
    return perturbed
end

# Simulated Annealing main function
function simulated_annealing()
    current_solution = initialize_solution()
    current_value = rastrigin(current_solution)
    best_solution = copy(current_solution)
    best_value = current_value
    temperature = initial_temperature

    for iteration in 1:max_iterations
        # Generate new candidate solution by perturbation
        candidate_solution = perturb_solution(current_solution)
        candidate_value = rastrigin(candidate_solution)

        # Acceptance probability (Metropolis criterion)
        ΔE = candidate_value - current_value
        if ΔE < 0 || rand() < exp(-ΔE / temperature)
            current_solution = candidate_solution
            current_value = candidate_value
        end

        # Update best solution found so far
        if current_value < best_value
            best_solution = copy(current_solution)
            best_value = current_value
        end

        # Decrease temperature
        temperature *= cooling_rate
        if iteration % 100 == 0
            println("Iteration $iteration: Best Value = $best_value, Temperature = $temperature")
        end
    end

    return best_solution, best_value
end

# Run the simulated annealing algorithm
best_solution, best_value = simulated_annealing()
println("Best Solution: ", best_solution)
println("Best Value (Minimum): ", best_value)

Iteration 100: Best Value = 79.6456061966745, Temperature = 36.60323412732294
Iteration 200: Best Value = 68.74807121999402, Temperature = 13.397967485796167
Iteration 300: Best Value = 35.612137805380314, Temperature = 4.9040894071285726
Iteration 400: Best Value = 22.630909076712427, Temperature = 1.7950553275045138
Iteration 500: Best Value = 22.116366182140844, Temperature = 0.6570483042414603
Iteration 600: Best Value = 22.11468680700349, Temperature = 0.24050092913110663
Iteration 700: Best Value = 21.995266535865557, Temperature = 0.08803111816824594
Iteration 800: Best Value = 21.93833868942773, Temperature = 0.03222223628802339
Iteration 900: Best Value = 21.918338689427735, Temperature = 0.011794380589564411
Iteration 1000: Best Value = 21.918338689427735, Temperature = 0.004317124741065788
Best Solution: [-0.9999999999999993, -1.3357370765021415e-16, -1.9799999999999993, 3.979999999999999, -0.9899999999999995]
Best Value (Minimum): 21.918338689427735

17.4.1.3.3 Particle swarm optimization (PSO)

Particle swarm optimization is a metaheuristic optimization algorithm inspired by the social behavior of birds flocking or fish schooling. It is used to solve optimization problems by iteratively improving a candidate solution based on the velocity and position of particles (potential solutions) in the search space. The PSO algorithm differs from other methods in a key way, that instead of updating a single candidate solution at each iteration, we update a population (set) of candidate solutions, called a swarm. Each candidate solution in the swarm is called a particle. We think of a swarm as an apparently disorganized population of moving individuals that tend to cluster together while each individual seems to be moving in a random direction. The PSO algorithm aims to mimic the social behavior of animals and insects.

using Random, CairoMakie

# Define the Rosenbrock function
function rosenbrock(x)
    return (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
end

# PSO Implementation
function particle_swarm_optimization(objective, n_particles, n_iterations, bounds, dim)
    # Initialize particles
    positions = [rand(bounds[1]:0.1:bounds[2], dim) for _ in 1:n_particles]
    velocities = [rand(-1.0:0.1:1.0, dim) for _ in 1:n_particles]
    personal_best_positions = deepcopy(positions)
    personal_best_scores = [objective(p) for p in positions]
    global_best_position = personal_best_positions[argmin(personal_best_scores)]
    global_best_score = minimum(personal_best_scores)

    # PSO parameters
    ω = 0.5            # Inertia weight
    c1, c2 = 2.0, 2.0  # Cognitive and social learning factors

    # Optimization loop
    for iter in 1:n_iterations
        for i in 1:n_particles
            # Update velocity
            r1, r2 = rand(), rand()
            velocities[i] .= ω .* velocities[i] +
                             c1 * r1 .* (personal_best_positions[i] - positions[i]) +
                             c2 * r2 .* (global_best_position - positions[i])

            # Update position
            positions[i] .= positions[i] .+ velocities[i]

            # Clamp positions within bounds
            positions[i] .= clamp.(positions[i], bounds[1], bounds[2])

            # Evaluate fitness
            score = objective(positions[i])
            if score < personal_best_scores[i]
                personal_best_positions[i] = deepcopy(positions[i])
                personal_best_scores[i] = score
            end

            if score < global_best_score
                global_best_position = deepcopy(positions[i])
                global_best_score = score
            end
        end
        if iter % 100 == 0
            println("Iteration $iter: Best Score = $global_best_score")
        end
    end
    return global_best_position, global_best_score
end

# Parameters
n_particles = 30
n_iterations = 100
bounds = (-2.0, 2.0)
dim = 2

# Run PSO
best_position, best_score = particle_swarm_optimization(rosenbrock, n_particles, n_iterations, bounds, dim)

println("Best Position: $best_position")
println("Best Score: $best_score")

# Visualization
x = -2.0:0.05:2.0
y = -1.0:0.05:3.0
Z = [(1 - xᵢ)^2 + 100 * (yᵢ - xᵢ^2)^2 for yᵢ in y, xᵢ in x]

# Heatmap and Scatter Plot
fig = Figure()
ax = Axis(fig[1, 1], title="PSO Optimization of Rosenbrock", xlabel="x", ylabel="y")
heatmap!(ax, x, y, Z, colormap=:viridis)
scatter!(ax, [best_position[1]], [best_position[2]], color=:red, markersize=10, label="Optimal Solution")
axislegend(ax)
fig

Iteration 100: Best Score = 5.829480589621494e-19
Best Position: [1.0000000001510196, 1.0000000003768819]
Best Score: 5.829480589621494e-19

17.4.1.3.4 Evolutionary Algorithm

An evolutionary algorithm (EA) is a family of optimization algorithms inspired by the principles of biological evolution. They are particularly useful for solving complex optimization problems where traditional gradient-based methods may struggle due to non-linearity, multimodality, or high dimensionality of the search space.

The following shows an example to maximize population fitness in terms of an objective function, with common crossover and mutation processes throughout all generations.

using Random

Random.seed!(1234)

# Parameters
population_size = 50           # Number of individuals in the population
chromosome_length = 5          # Number of genes in each individual (dimensionality)
generations = 100              # Number of generations
mutation_rate = 0.1            # Probability of mutation
crossover_rate = 0.7           # Probability of crossover
bounds = (-5.12, 5.12)         # Boundaries for each gene

# Target function: Rastrigin function
function rastrigin(x)
    A = 10
    n = length(x)
    return A * n + sum(xᵢ^2 - A * cos(2 * π * xᵢ) for xᵢ in x)
end

# Initialize population randomly within bounds
function initialize_population()
    return [rand(bounds[1]:0.01:bounds[2], chromosome_length) for _ in 1:population_size]
end

# Fitness function (negative because we are minimizing)
function fitness(individual)
    return -rastrigin(individual)
end

# Selection: Tournament selection
function tournament_selection(population, fitnesses)
    candidates = rand(1:population_size, 2)
    return ifelse(fitnesses[candidates[1]] > fitnesses[candidates[2]],
        population[candidates[1]], population[candidates[2]])
end

# Crossover: Single-point crossover
function crossover(parent1, parent2)
    if rand() < crossover_rate
        point = rand(1:chromosome_length)
        child1 = vcat(parent1[1:point], parent2[point+1:end])
        child2 = vcat(parent2[1:point], parent1[point+1:end])
        return child1, child2
    else
        return parent1, parent2
    end
end

# Mutation: Randomly change genes with some probability
function mutate(individual)
    for i in 1:chromosome_length
        if rand() < mutation_rate
            individual[i] = rand(bounds[1]:0.01:bounds[2])
        end
    end
    return individual
end

# Main Genetic Algorithm loop
function genetic_algorithm()
    population = initialize_population()
    best_individual = nothing
    best_fitness = -Inf

    for gen in 1:generations
        # Evaluate fitness
        fitnesses = [fitness(ind) for ind in population]

        # Find best individual in current population
        current_best = argmax(fitnesses)
        if fitnesses[current_best] > best_fitness
            best_fitness = fitnesses[current_best]
            best_individual = population[current_best]
        end

        # Generate new population
        new_population = []
        while length(new_population) < population_size
            # Selection
            parent1 = tournament_selection(population, fitnesses)
            parent2 = tournament_selection(population, fitnesses)

            # Crossover
            child1, child2 = crossover(parent1, parent2)

            # Mutation
            child1 = mutate(child1)
            child2 = mutate(child2)

            # Add children to new population
            push!(new_population, child1, child2)
        end
        population = new_population[1:population_size]

        if gen % 100 == 0
            println("Generation $gen: Best Fitness = ", best_fitness)
        end
    end

    return best_individual, -best_fitness
end

# Run the genetic algorithm
best_solution, best_value = genetic_algorithm()
println("Best Solution: ", best_solution)
println("Best Value (Minimum): ", best_value)

Generation 100: Best Fitness = -2.9892957379930465
Best Solution: [-0.02, -1.02, -0.04, 0.02, -1.04]
Best Value (Minimum): 2.9892957379930465

17.4.1.3.5 Bayesian optimization

Bayesian Optimization (BO) is a powerful technique for global optimization of expensive-to-evaluate black-box functions. It leverages probabilistic models to predict the objective function’s behavior across the search space and uses these models to make informed decisions about where to evaluate the function next. This approach efficiently balances exploration (searching for promising regions) and exploitation (exploiting regions likely to yield optimal values), making it particularly suitable for optimization problems where function evaluations are costly, such as tuning hyperparameters of machine learning models or optimizing parameters of complex simulations.

using Random

# Define your objective function to be optimized
function objective(x::Float64)
    return -(x^2 + 0.1 * sin(5 * x))  # Example objective function (negative because we seek maximum)
end
# Bayesian optimization function
function bayesian_optimization(objective, bounds::Tuple{Float64,Float64}, num_iterations::Int)
    Random.seed!(1234)  # Setting a seed for reproducibility
    X = Float64[]  # List to store evaluated points
    Y = Float64[]  # List to store objective values
    # Initial random point (you can choose other initial points as well)
    x_init = rand() * (bounds[2] - bounds[1]) + bounds[1]
    push!(X, x_init)
    push!(Y, objective(x_init))
    # Main loop
    for i in 1:num_iterations
        # Fit a model to the observed data (Gaussian Process in this case)
        # For simplicity, let's just use the current best observed value
        x_next = rand() * (bounds[2] - bounds[1]) + bounds[1]  # Random sampling
        # Evaluate the objective function at the chosen point
        y_next = objective(x_next)
        # Update the data with the new observation
        push!(X, x_next)
        push!(Y, y_next)
        # Here, we will just print the current best observed value
        println("Iteration $i: Best value = $(maximum(Y))")
    end
    # Return the best observed value and corresponding parameter
    best_idx = argmax(Y)
    return X[best_idx], Y[best_idx]
end

best_x, best_value = bayesian_optimization(objective, (-5.0, 5.0), 10)
println("Best x found: $best_x, Best value: $best_value")

Iteration 1: Best value = -0.3041807254074535
Iteration 2: Best value = -0.3041807254074535
Iteration 3: Best value = -0.3041807254074535
Iteration 4: Best value = -0.3041807254074535
Iteration 5: Best value = -0.3041807254074535
Iteration 6: Best value = -0.3041807254074535
Iteration 7: Best value = -0.3041807254074535
Iteration 8: Best value = 0.02504980758369785
Iteration 9: Best value = 0.02504980758369785
Iteration 10: Best value = 0.02504980758369785
Best x found: -0.057501331095793695, Best value: 0.02504980758369785

While gradient-free methods help optimize complex non-linear or noisy functions, some financial and operational problems are best modeled with linear relationships. The next section shows how to set up such problems, including constraints, using linear and integer programming.

17.4.2 Linear optimization

Linear optimization, also known as linear programming (LP), is a mathematical method for finding the best outcome in a mathematical model with linear relationships. It involves optimizing a linear objective function subject to a set of linear equality and inequality constraints. Linear programming has a wide range of applications across various fields, including operations research, economics, engineering, and logistics.

We will use linear optimization to solve the following problem, with \(n\) the number of elements in \(b\):

\[ \begin{aligned} \max_{x} \quad & c \cdot x \\ \text{subject to} \quad & x \geq 0 \\ & A_i \cdot x \leq b_i \quad \forall i \in \{1, 2, \ldots, n\} \end{aligned} \]

using JuMP, GLPK, LinearAlgebra

# Define the objective coefficients
c = [1.0, 2.0, 3.0]
# Define the constraint matrix (A) and right-hand side (b)
A = [1.0 1.0 0.0;
    0.0 1.0 1.0]
b = [10.0, 20.0]
# Create a JuMP model
linear_model = Model(GLPK.Optimizer)
# Define decision variables
@variable(linear_model, x[1:3] >= 0)
# Define objective function
@objective(linear_model, Max, dot(c, x))
# Add constraints
@constraint(linear_model, constr[i=1:2], dot(A[i, :], x) <= b[i])
# Solve the optimization problem
optimize!(linear_model)

# Print results
println("Objective value: ", objective_value(linear_model))
println("Optimal solution:")
for i in 1:3
    println("\tx[$i] = ", value(x[i]))
end

WARNING: using JuMP.objective_function in module Notebook conflicts with an existing identifier.
Objective value: 70.0
Optimal solution:
    x[1] = 10.0
    x[2] = 0.0
    x[3] = 20.0

17.4.2.1 Integer programming

Integer Programming (IP) is a type of optimization problem where some or all of the variables are restricted to be integers. Although the problem definition seems similar to an LP, the complexity of solving an IP hugely increases as the solution space is not continuous but discrete.

Let us use IP to solve this problem. A factory produces two types of products \(x_1\) and \(x_2\) with the following details:

\[ \begin{aligned} \max_{x} \quad & 40x_1 + 50x_2 \\ \text{subject to} \quad & x_1, x_2 \in \mathbb{Z} \\ & 4x_1 + 3x_2 \leq 200 \quad \text{(labor)} \\ & x_1 + 2x_2 \leq 40 \quad \text{(material)} \end{aligned} \]

# Import necessary packages
using JuMP, GLPK

# Create a model with the GLPK solver
model = Model(GLPK.Optimizer)

# Define decision variables (x₁ and x₂ are integers)
@variable(model, x₁ >= 0, Int)
@variable(model, x₂ >= 0, Int)

# Define the objective function (maximize profit)
@objective(model, Max, 40 * x₁ + 50 * x₂)

# Add constraints
@constraint(model, 4x₁ + 3x₂ <= 200)  # Labor constraint
@constraint(model, x₁ + 4x₂ <= 40)   # Material constraint

# Solve the model
optimize!(model)

# Check the solution status
if termination_status(model) == MOI.OPTIMAL
    println("Optimal solution found!")
    println("x₁ (Product x₁ units): ", value(x₁))
    println("x₂ (Product x₂ units): ", value(x₂))
    println("Maximum Profit: ", objective_value(model))
else
    println("No optimal solution found.")
end

Optimal solution found!
x₁ (Product x₁ units): 40.0
x₂ (Product x₂ units): 0.0
Maximum Profit: 1600.0

17.5 Example: Modeling fitting

In model fitting, the “best fitting curve” refers to the curve or function that best describes the relationship between the independent and dependent variables in the data. The goal of model fitting is to find the parameters of the chosen curve or function that minimize the difference between the observed data points and the values predicted by the model.

The process of finding the best fitting curve typically involves:

Choosing a model: Based on the nature of the data and the underlying relationship between the variables, a suitable model or family of models are selected.
Estimating parameters: Using the chosen model, one estimates the parameters that best describe the relationship between the variables. This is often done using optimization techniques such as least squares regression, maximum likelihood estimation, or Bayesian inference.
Evaluating the fit: Once the parameters are estimated, one evaluates the goodness of fit of the model by comparing the predicted values to the observed data. Common metrics for evaluating fit, or error functions, include the residual sum of squares, the coefficient of determination (R-squared), and visual inspection of the residuals.
Iterating if necessary: If the fit is not satisfactory, one may need to iterate on the model or consider alternative models until you find a satisfactory fit to the data.

While we have shown general approaches like BFGS or gradient‐free schemes, libraries such as LsqFit wrap these concepts into convenient functions. Under the hood, these packages may employ gradient‐based methods (including automatic differentiation) to refine parameters. Below is a demonstration:

using LsqFit, CairoMakie

x_data = 0:0.1:10
y_data = 2 .* sin.(x_data) .+ 0.5 .* randn(length(x_data))
# Define the model function
curve_model(x, p) = p[1] * x .^ 5 + p[2] * x .^ 4 .+ p[3] * x .^ 3 .+ p[4] * x .^ 2 .+ p[5] * x .+ p[6]
# Initial parameter guess
p₀ = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
# Fit the model to the data
fit_result = curve_fit(curve_model, x_data, y_data, p₀)
# Extract the fitted parameters
params = coef(fit_result)
# Evaluate the model with the fitted parameters
y_fit = curve_model(x_data, params)
# Plot the data and the fitted curve
fig = Figure()
Axis(fig[1, 1], title="Curve Fitting - a polynomial guess-fitting a sinusoidal function")
scatter!(x_data, y_data, label="Data")
lines!(x_data, y_fit, label="Fitted Curve", linestyle=:dash, color=:red)
fig

17.6 More Resources

The textbook “Algorithms For Optimization” (by Kochenderfer And Wheeler) is a comprehensive introduction to optimization and uses Julia for it’s examples.

In Julia, Optimization.jl provides a unified front-end for all kinds of general optimization problems. JuMP.jl provides a unified front-end in a specialized optimization mini-domain specific language.