13 Elements of Computer Science

Author

Alec Loudenback

“Fundamentally, computer science is a science of abstraction—creating the right model for a problem and devising the appropriate mechanizable techniques to solve it. Confronted with a problem, we must create an abstraction of that problem that can be represented and manipulated inside a computer. Through these manipulations, we try to find a solution to the original problem.” — Al Aho and Jeff Ullman (1992)

13.1 Chapter Overview

Adapting computer science concepts to work for financial professionals. Computability, computational complexity, and the language of algorithms and problem solving. A survey of important data structures. More advanced testing and verification topics.

13.2 Computer Science for Financial Professionals

Computer science as a term can be a bit misleading because of the overwhelming association with the physical desktop or laptop machines that we call “computers”. The discipline of computer science is much richer than consumer electronics: at its core, computer science concerns itself with areas of research and answering tough questions:

Algorithms and Optimization. How can a problem be solved efficiently? How can that problem be solved at all? Given constraints, how can one find an optimal solution?
Theory of Computation. What sorts of questions are even answerable? Is an answer easy to compute or will resolving it require more resources than the entire known universe? Will a computation ever stop calculating?
Data Structures. How to encode, store, and use data? How does that data relate to each other and what are the trade-offs between different representations of that data?
Information Theory¹. Given limited data, what can be known or inferred from it?

For a reader in the twenty-first century we hope that it is patently obvious how impactful the applied computer science has been as end-users of the internet, artificial intelligence, computational photography, safety control systems, etc. have been to our lives. It is a testament to the utility of being able to harness computer science ideas for practical use.

It’s common for beneficial advances in knowledge and application to occur at the boundary between two disciplines. It’s here in this chapter that we desire to bring together the financial discipline with computer science and to provide the financial practitioner with the language and concepts to leverage some of computer science’s most relevant ideas.

13.3 Algorithms

An Algorithm is a general term for a process that transforms an input to an output. It’s the set of instructions dictating how to carry out a process. That process needs to be specified in sufficient detail to be able to call itself an algorithm (versus a heuristic which lacks that specificity).

An algorithm might be the directions to accomplish the following task: summing a series of consecutive integers from \(1\) to \(n\). There are multiple ways that this might be accomplished, each one considered a distinct algorithm:

iterating over each number and summing them up (starting with the smallest number)
iterating over each number and summing them up (starting with the largest number)
summing up the evens and then the odds, then adding the two subtotals
and many more distinct algorithms…

We will look at some specific examples of these alternate algorithms as we introduce the next topic, computational complexity.

13.4 Complexity

13.4.1 Computational Complexity

We can characterize the computational complexity of a problem by looking at how long an algorithm takes to complete a task when given an input of size \(n\). We can then compare two approaches to see which is computationally less complex for a given \(n\). This is a way of systematically evaluating an algorithm to determine its efficiency when being computed.

Note that computational complexity isn’t quite the same as how fast an algorithm will run on your computer, but it’s a very good guide. Modern computer architectures can sometimes execute multiple instructions in a single cycle of the CPU making an algorithm that is, on paper slower than another, actually run faster in practice. Additionally, sometimes algorithms are able to substantially limit the number of computations to be performed, at the expense of using a lot more memory and thereby trading CPU usage with RAM usage.

You can think of computational complexity as a measure of how much work is to be performed. Sometimes the computer is able to perform certain kinds of work more efficiently.

Further, when we analyze an algorithm recall that ultimately our code gets translated into instructions for the computer hardware. Some instructions are implemented in a way that for any type of number (e.g. floating point), it doesn’t matter if the number is 1.0 or 0.41582574300044717, the operation will take the exact same time and number of instructions to execute (e.g. for the addition operation).

Sometimes a higher level operation is implemented in a way that takes many machine instructions. For example, division instructions may require many CPU cycles when compared to addition or multiplication. Sometimes this is an important distinction and sometimes not. For this book we will ignore this granularity of analysis.

13.4.1.1 Example: Sum of Consecutive Integers

Take for example the problem of determining the sum of integers from \(1\) to \(n\). We will explore three different algorithms and the associated computational complexity for them.

13.4.1.1.1 Constant Time

A mathematical proof can show a simple formula for the result. This allows us to compute the answer in constant time, which means that for any \(n\), our algorithm is essentially the same amount of work.

nsum_constant(n) = n * (n + 1) ÷ 2

nsum_constant (generic function with 1 method)

In this we see that we perform three operations: a multiplication, a sum, and a division, no matter what n is. If n is 10_000_000 we’d expect this to complete in about a single unit of time.

13.4.1.1.2 Linear Time

This algorithm performs a number of operations which grows in proportion with \(n\) by individually summing up each element in \(1\) through \(n\):

function nsum_linear(n)
    result = 0
    for i in 1:n
        result += i
    end

    result
end

nsum_linear (generic function with 1 method)

If \(n\) were 10_000_000, we’d expect it to run with roughly 10 million operations, or about 3 million times as many operations as the constant time version. We can say that this version of the algorithm will take approximately \(n\) steps to complete.

13.4.1.1.3 Quadratic Time

What if we were less efficient, and instead we were only ever able to increment our subtotal by one. That is, instead of adding up \(1+3\), we had to instead do four operations: \(1+1+1+1\) . We can add a second loop which increments our result by a unit instead of simply adding the current i to the running total result. This makes our algorithm work much harder since it has to add numbers so many more times (Recall that to a computer adding two numbers is the same computational effort regardless of what the numbers are).

function nsum_quadratic(n)
    result = 0
1    for i in 1:n
2        for j in 1:i
            result += 1
        end
    end

    result
end

1: The outer loop with iterator i. loops over the integers \(1\) to \(n\).
2: The inner loop with iterator j does the busy work of adding \(1\) to our subtotal i times.

nsum_quadratic (generic function with 1 method)

Breaking down the steps:

When i is 1 there is 1 addition in the inner loop
When i is 2 there are 2 additions in the inner loop
…
When i is n there are n additions in the inner loop

Therefore, this computation takes \((1 + 2 + \cdots + n\) steps to complete. Algebraically, this simplifies down to our constant time formula: it requires \(n * (n + 1) ÷ 2\) or \(n^2 + n ÷ 2\) steps to complete.

13.4.1.2 Comparison

13.4.1.2.1 Big-O Notation

We can categorize the above implementations using a convention called Big-O Notation² which is a way of distilling and classifying computational complexity. We characterize the algorithms by the most significant term in the total number of operations. Table 13.1 shows for the examples constructed above what the description, order, and order of magnitude complexity is.

Table 13.1: Complexity comparison for the three sample cases of summing integers from \(1\) to \(n\).

Function	Computational Cost	Complexity Description	Big-O Order	Steps (\(n=10,000\))
`nsum_constant`	fixed	Constant	\(O(1)\)	~1
`nsum_linear`	\(n\)	Linear	\(O(n)\)	~10,000
`nsum_quadratic`	\(n^2 + n ÷ 2\)	Quadratic	\(O(n^2)\)	~100,000,000

Table 13.2 shows a comparison of a more extended set of complexity levels. For the most complex categories of problems, the cost to compute grows so fast that it boggles the mind. What sorts of problems fall into the most complex categories? \(O(2^n)\), or exponential complexity, examples include the traveling salesman problem³ if solved with dynamic programming or the recursive approach to calculating the \(nth\) Fibonacci number. The beastly \(O(n!)\) algorithms include brute force solving the traveling salesman problem or enumerating all partitions of a set. In financial modeling, we may encounter these sorts of problems in portfolio optimization (using the brute-force approach of testing every potential combination assets to optimize a portfolio).

Table 13.2: Different Big-O Orders of Complexity

Big-O Order	Description	\(n=10\)	\(n=1,000\)	\(n=1,000,000\)
\(O(1)\)	Constant Time	1	1	1
\(O(n)\)	Linear Time	10	1,000	1,000,000
\(O(n^2)\)	Quadratic Time	100	1,000,000	10^12
\(O(log(n))\)	Logarithmic Time	3	7	14
\(O(n\times log(n))\)	Linearithmic Time	30	7,000	14,000,000
\(O(2^n)\)	Exponential Time	1,024	~10^300	~10^301029
\(O(n!)\)	Factorial Time	3,628,800	~10^2567	~10^5565708

Note

We care only about the most significant term because when \(n\) is large, the most significant term tends to dominate. For example, in our quadratic time example which has \(n^2 + n ÷ 2\) steps, if \(n\) is a large number like \(10^6\), then we see that it will result in:

\[\begin{align*} n^2 + \frac{n}{2} &= (10^6)^2 + \frac{10^6}{2} \\ &= 10^{12} + \frac{10^6}{2} \end{align*}\]

\(10^{12}\) is significantly more important than \(\frac{10^6}{2}\) (sixty-four million times as important, to be precise). This is why Big-O notation reduces the problem down to only the most significant complexity cost term.

If n is small then we don’t really care about computational complexity in general. This is a lesson for our efforts as developers: focus on the most intensive parts of calculations when looking to optimize, and don’t worry about seldom used portions of the code.

13.4.1.2.2 Empirical Results

The preceding examples of constant, linear, and quadratic times are conceptually correct but if we try to run them in practice we see that the description doesn’t seem to hold at all for the linear time version, as it runs as quickly as the constant time version.

using BenchmarkTools
@btime nsum_constant(10_000)

  0.833 ns (0 allocations: 0 bytes)

50005000

@btime nsum_linear(10_000)

  1.500 ns (0 allocations: 0 bytes)

50005000

@btime nsum_quadratic(10_000)

  1.104 μs (0 allocations: 0 bytes)

50005000

What happened was that the compiler was able to understand and optimize the linear version such that it effectively transformed it into the constant time version and avoid the iterative summation that we had written. For examples that are simple enough to use as a teaching problem, the compiler can often optimize different written code down to the same efficient machine code (this is the same Triangular Number optimization we saw in Section 5.4.3.4).

13.4.1.3 Expected versus worst-case complexity

Another consideration is that there may be one approach which performs better in the majority of cases, at the expense of having very poor performance in specific cases. Sometimes we may risk those high cost cases if we expect the benefit to be worthwhile on the rest of the problem set.

This often happens when the data we are working with has some concept of “distance”. For example, in multi-stop route planning we can use the idea that it’s likely to be more efficient to visit nearby destinations first. Generally this works, but sometimes the nearest distance actually has a high cost (such as needing to avoid real-world obstacles in the way which force you to drive past other further away locations to get there).

13.4.2 Space Complexity

So far we have focused on computational complexity, however similar analysis could be performed for space complexity, which is how much computer memory is required to solve a problem. Sometimes, an algorithm will trade computational complexity for space complexity. That is, we might be able to solve a problem much faster if we have more memory available.

For example, there has been research to improve the computational efficiency of matrix multiplication which do indeed run faster than traditional techniques. However, those algorithms don’t get implemented in general linear algebra libraries because they require way more memory than is available!

13.4.3 Complexity: Takeaways

The idea of algorithmic complexity is important because it grounds us in the harsh truth that some problems are very difficult to compute. It’s in these cases that a lot of the creativity and domain specific heuristics can become the foremost consideration.

We must remember to be thoughtful about the design of our models. When searching for additional performance to look for the “loops-within-loops” which is where combinatorial explosions tend to happen. Focusing on the places that have large \(n\) or poor Big-O order that you can transform the performance of the overall model. Sometimes though, the fundamental complexity of the problem at hand forbids greater efficiency.

See the end of this chapter, Section 13.7 for an example demonstrating computational complexity in the context of portfolio selection.

13.5 Data Structures

The science of data structures is about how data is represented conceptually and in computer applications.

For example, how should the string “abcdef” be represented and analyzed?

There are many common data structures and many specialized subtypes. We will describe some of the most common ones here. Julia has many data structures available in the Base library, but an extensive collection of other data structures can be found in the DataStructures.jl package.

13.5.1 Arrays

An array is a contiguous block of memory containing elements of the same type, accessed via integer indices. Arrays have fast random access and are the fastest data structure for linear/iterated access of data.

In Julia, an array is a very common data structure and is implemented with a simple declaration, such as:

x = [1,2,3]

In memory, the integers are stored as consecutive bits representing the integer values of 1, 2, and 3, and would look like this (with the different integers shown on new lines for clarity):

0000000000000000000000000000000000000000000000000000000000000001
0000000000000000000000000000000000000000000000000000000000000010
0000000000000000000000000000000000000000000000000000000000000011

This is great for accessing the values one-by-one or in consecutive groups, but it’s not efficient if values need to be inserted in between. For example, if we wanted to insert 0 between the 1 and 2 in x, then we’d need to overwrite the second position in the array, ask the operating system to allocate more memory⁴, and re-write the bytes that come after our new value. Inserting values at the end (push!(array, value)) is usually fast unless more memory needs to be allocated.

13.5.2 Linked Lists

A linked list is a chain of nodes where each node contains a value and a pointer to the next node. Linked lists allow for efficient insertion and deletion but slower random access compared to arrays.

In Julia, a simple linked list node could be implemented as:

mutable struct Node
    value::Any
1    next::Union{Node, Nothing}
end

z = Node(3,nothing)
y = Node(2,z)
x = Node(1,y)

1: Nothing represents the end of the linked list.

Inserting a new node between existing nodes is efficient - if we wanted to insert a new node between the ones with value 2 and 3, we could do this:

a = Node(0,z) # <1> Create a new `Node` with `next` equal to `z`
y.next = a # <2> Set the reference to `next` in `y` to be the new `Node` `a`.

Accessing the nth element requires traversing the list from the beginning to check each Node’s next value. This iterative approach makes it \(O(n)\) time complexity for random access. This is in contrast to an array where you know right away where the \(n\)th item will be in the data structure.

Also, the linked list is one-directional. Items further down the chain don’t know what node points to them, so it’s impossible to traverse the list backwards.

There are many related implementations which make random access or traversal in reverse order more efficient such as doubly-linked lists or “trees” (Section 13.5.6) which organize the data not as a chain but as a tree with branching nodes.

13.5.3 Records/Structs

An aggregate of named fields, typically of fixed size and sequence. Records group related data together. We’ve encountered structs in Section 5.4.7, but here we’ll add that simple structs with primitive fields can themselves be represented without creating pointers to the data stored:

struct SimpleBond
    id::Int
    par::Float64
end

struct LessSimpleBond
    id::String
    par::Float64
end

a = SimpleBond(1, 100.0)
b = LessSimpleBond("1", 100.0)
isbits(a), isbits(b)

(true, false)

Because a is comprised of simple elements, it can be represented as a contiguous set of bits in memory. It would look something like this in memory:

0000000000000000000000000000000000000000000000000000000000000001
0100000001011001000000000000000000000000000000000000000000000000

1: The bits of 1
2: The bits of 100.0

In contrast, the LessSimpleBond uses a String to represent the ID of the bond. In Julia, String is an immutable type that internally references a buffer of bytes; because it holds a reference, a struct containing a String is not an isbits type.

.... a pointer ...
0100000001011001000000000000000000000000000000000000000000000000

1: a pointer/reference to the array of characters that comprise the string ID
2: The bits of 100.0

In performance critical code, having data that is represented with simple bits instead of references/pointers can be much faster (see Chapter 30 for an example).

Note

For many mutable types, there are immutable, bits-types alternatives. For example:

Arrays have a StaticArray counterpart (from the StaticArrays.jl package).
Strings have InlineStrings (from the InlineStrings.jl package) which use fixed-width representations of strings.

The downsides to the immutable alternatives (other than the loss of potentially desired flexibly that mutability provides) are that they can be harder on the compiler (more upfront compilation cost) to handle the specialized cases involved.

13.5.4 Dictionaries (Hash Tables)

13.5.4.1 Hashes and Hash Functions.

Hashes are the result of a hash function that maps arbitrary data to a fixed size value. It’s sort of a “one way” mapping to a simpler value which has the benefits of:

One way so that if someone knows the hashed value, it’s very difficult to guess what the original value was. This is most useful in cryptographic and security applications.
Creating (probabilistically) unique IDs for a given set of data.

For example, we can calculate an SHA hash on any data:

import SHA
let
    a = SHA.sha256("hello world") |> bytes2hex
    b = SHA.sha256(rand(UInt8, 10^6)) |> bytes2hex
    println(a)
    println(b)
end

b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
32ec5eb51b20244052dc016beb11c5e256ddabe3b64ac2f3a8388ab08d0c3ea6

We can easily verify that the sha256 hash of "hello world" is the same each time, but it’s virtually impossible to guess "hello world" if we are just given the resulting hash. This is the premise of trying to “crack” a password when the stored password hash is stolen.

One way to check if two sets of data are the same is to compute the hash and see if the resulting hashes are equal. For example, maybe you want to see if two data files with different names contain the same data - comparing the hashes is a sure way to determine if they contain the same data.

13.5.4.2 Dictionaries

Dictionaries map a key to a value. More specifically, they use the hash of a key to store a reference to the value.

Dictionaries offer constant-time average case access but must handle potential collisions of keys (generally, the more robust the collision handling means higher fixed cost for access).

Here’s an illustrative portfolio of assets indexed by CUSIPs:

assets = Dict(
    # CUSIP => Asset
    "037833AH4" => Bond("General Electric",...),
    "912828M80" => Equity("AAPL",...),
    "594918BQ1" => Bond("ENRON",...),
)

Then, lookup is performed by indexing the dictionary by the desired key:

assets["037833AH4"] # gives the General Electric Bond

13.5.5 Graphs

A graph is a collection of nodes (also called vertices) connected by edges to represent relationships or connections between entities. Graphs are versatile data structures that can model various real-world scenarios such as social networks, transportation systems, or computer networks.

In Julia, a simple graph could be implemented using a dictionary where keys are nodes and values are lists of connected nodes:

struct Graph
    nodes::Dict{Any, Vector{Any}}
end

function add_edge!(graph::Graph, node1, node2)
    push!(get!(graph.nodes, node1, []), node2)
    push!(get!(graph.nodes, node2, []), node1)
end

g = Graph(Dict())
add_edge!(g, 1, 2)
add_edge!(g, 2, 3)
add_edge!(g, 1, 3)

This implementation represents an undirected graph. For a directed graph, you would only add the edge in one direction.

Graphs can be traversed using various algorithms such as depth-first search (DFS) or breadth-first search (BFS). These traversals are useful for finding paths, detecting cycles, or exploring connected components.

For more advanced graph operations, the Graphs.jl package provides a comprehensive set of tools for working with graphs in Julia.

13.5.6 Trees

A tree is a hierarchical data structure with a root node and child subtrees. Each node in a tree can have zero or more child nodes, and every node (except the root) has exactly one parent node. Trees are widely used for representing hierarchical relationships, organizing data for efficient searching and sorting, and in various algorithms.

A simple binary tree node in Julia could be implemented as:

mutable struct TreeNode
    value::Any
    left::Union{TreeNode, Nothing}
    right::Union{TreeNode, Nothing}
end

# Creating a simple binary tree
root = TreeNode(1, 
    TreeNode(2, 
        TreeNode(4, nothing, nothing), 
        TreeNode(5, nothing, nothing)
    ), 
    TreeNode(3, 
        nothing, 
        TreeNode(6, nothing, nothing)
    )
)

Trees have various specialized forms, each with its own properties and use cases:

Binary Search Trees (BST): Each node has at most two children, with all left descendants less than the current node, and all right descendants greater.
AVL Trees: Self-balancing binary search trees, ensuring that the heights of the two child subtrees of any node differ by at most one.
B-trees: Generalization of binary search trees, allowing nodes to have more than two children. Commonly used in databases and file systems.
Trie (Prefix Tree): Used for efficient retrieval of keys in a dataset of strings. Each node represents a common prefix of some keys.

Trees support efficient operations like insertion, deletion, and searching, often with \(O(log n)\) time complexity for balanced trees. They are fundamental in many algorithms and data structures, including heaps, syntax trees in compilers, and decision trees in machine learning.

13.5.7 Data Structures Conclusion

Data structures have strengths and weakness depending on whether you want to prioritize computational efficiency, memory (space) efficiency, code simplicity, and/or mutability. Due to the complexity of real world modeling needs, it can be the case that different representations of the data are more natural or more efficient for the use case at hand.

13.6 Verification and Advanced Testing

13.6.1 Formal Verification

Formal verification is a technique used to prove or disprove the correctness of algorithms with respect to a certain formal specification or property. In essence, it’s a mathematical approach to ensuring that a system behaves exactly as intended under all possible conditions.

In formal verification, we use mathematical methods to:

Create a formal model of the system
Specify the desired properties or behaviors
Prove that the model satisfies these properties

This process can be automated using specialized software tools called theorem provers or model checkers.

13.6.1.1 Formal Verification in Practice

It sounds like the perfect risk management and regulatory technique: prove that the system works exactly as intended. However, there has been very limited deployment of formal verification in industry. This is for several reasons:

Incomplete Coverage: It’s often impractical to formally verify entire large-scale financial systems. Verification, if at all, is typically limited to critical components.
Incomplete Specification: Actually reasoning through how the system should behave in all scenarios requires actually contemplating mathematically complete and rigorous possibilities that could occur.
Model-Reality Gap: The formal model may not perfectly represent the real-world system, especially in finance where market behavior can be unpredictable.
Changing Requirements: Financial regulations and market conditions change rapidly, potentially outdating formal verifications.
Performance Trade-offs: Systems designed for easy formal verification might sacrifice performance or flexibility.
Cost: The process can be expensive in terms of time and specialized labor.

13.6.2 Property Based Testing

Testing will be discussed in more detail in Chapter 12, but an intermediate concept between Formal Verification and typical software testing is property-based testing, which tests for general rules instead of specific examples.

For example, a function which is associative (\((a + b) + c = a + (b + c)\)) or commutative (\(a + b\) = \(b + a\)) can be tested with simple examples like:

using Test

myadd(a,b) = a + b

@test myadd(1,2) == myadd(2,1) # commutative
@test myadd(myadd(1,2),3) == myadd(1,myadd(2,3)) # associative

However, we really haven’t proven the associative and commutative properties in general. There are techniques to do this, which is a more comprehensive alternative to testing specific examples above. Packages like Supposition.jl provide functionality for this. Note that like Formal Verification, property-based testing is a more advanced topic.

Warning

Property-based testing is also tied in with concepts related to types we talked about in Section 5.4.1: floating point numbers are not associative when summing more than two numbers at a time: the order in which the floats get added affects the accumulated imprecision.

13.6.3 Fuzzing

Fuzzing is kind of like property based testing, but instead of testing general rules, we generalize the simple examples using randomness. For example, we could test the commutative property using random numbers instead, therefore statistically checking that the property holds:

@testset for i in 1:10000
    a = rand()
    b = rand()

    @test myadd(a,b) == myadd(b,a)
end

This is a good advancement over the simple @test myadd(1,2) == myadd(2,1), in terms of checking the correctness of myadd, but it comes at the cost of more computational time and non-deterministic tests. Fuzzing typically seeks to find edge-cases: crashes or unexpected behavior by exploring random/invalid inputs.

13.7 Finance Example: Complexity and Portfolio Selection

Many real portfolio decisions include discrete choices: buy or skip a lot, include or exclude an ETF, cap the number of names, or enforce minimum ticket sizes. These discrete elements quickly turn an otherwise smooth optimization into a combinatorial search.

Consider a simplified subset selection problem with \(n\) candidate assets. Each asset \(i\) has an expected return \(r_i\) and a “cost” \(c_i\) (e.g., capital required for a minimum lot). We wish to select a subset that maximizes total expected return subject to a budget \(B\). This is the classic 0–1 knapsack problem:

\[ \max_{x \in \{0,1\}^n} \sum_{i=1}^{n} r_i\, x_i \quad \text{subject to} \quad \sum_{i=1}^{n} c_i\, x_i \le B \]

Brute force enumerates all \(2^n\) subsets to find the best feasible one. Complexity: \(O(2^n \cdot n)\) work, so it explodes rapidly.
A greedy heuristic sorts by value-to-cost ratio \(r_i/c_i\) and takes items while budget remains. Complexity: \(O(n \log n)\); fast, but not guaranteed optimal.
Branch-and-bound (B&B) remains worst‑case exponential, but prunes large parts of the search using an optimistic upper bound (here, the fractional knapsack relaxation), often exploring far fewer nodes in practice.

Rather than relying on wall‑clock times (which depend on your machine), we count “visited” nodes—how many partial solutions each method examines—to illustrate growth in work.

using Random

# Generate a synthetic instance. Pass an RNG for reproducibility.
function synth_instance(rng::AbstractRNG, n::Int; budget_ratio::Float64=0.35)
    r = 0.04 .+ 0.08 .* rand(rng, n)            # expected returns in [4%, 12%]
    c = Float64.(rand(rng, 50:150, n))          # costs (min lot "sizes"), as reals
    B = budget_ratio * sum(c)                   # budget as a real, not floored
    return r, c, B
end

# Greedy heuristic: sort by value-to-cost ratio and pack until budget is exhausted.
function greedy_knapsack(r::AbstractVector{<:Real},
    c::AbstractVector{<:Real},
    B::Real)
    n = length(r)
    ratio = r ./ c
    idx = sortperm(ratio, rev=true)           # highest ratio first
    total = 0.0
    cost = 0.0
    chosen = falses(n)
    @inbounds for i in idx
        if cost + c[i] <= B + 1e-12
            chosen[i] = true
            total += r[i]
            cost += c[i]
        end
    end
    return total, chosen
end

# Brute-force: enumerate all 2^n subsets (only feasible for small n).
function brute_force_knapsack(r::AbstractVector{<:Real},
    c::AbstractVector{<:Real},
    B::Real)
    n = length(r)
    best = -Inf
    bestmask = 0
    visited = 0
    total_subsets = (Int(1) << n)
    @inbounds for mask in 0:(total_subsets-1)
        visited += 1
        value = 0.0
        cost = 0.0
        for i in 1:n
            if ((mask >>> (i - 1)) & 0x1) == 1
                value += r[i]
                cost += c[i]
                if cost > B
                    break
                end
            end
        end
        if cost <= B && value > best
            best = value
            bestmask = mask
        end
    end
    chosen = falses(n)
    for i in 1:n
        chosen[i] = ((bestmask >>> (i - 1)) & 0x1) == 1
    end
    return best, chosen, visited
end

# Branch-and-Bound with a fractional-knapsack upper bound (optimistic).
# The upper bound relaxes x_i ∈ {0,1} to x_i ∈ [0,1] and "fills" the remaining budget fractionally.
function branch_and_bound_knapsack(r::AbstractVector{<:Real},
    c::AbstractVector{<:Real},
    B::Real)
    n = length(r)
    rs = Float64.(r)
    cs = Float64.(c)
    Bf = float(B)

    ratio = rs ./ cs
    order = sortperm(ratio, rev=true)
    rs = rs[order]
    cs = cs[order]

    best = Ref(-Inf)            # current best objective
    best_choice = falses(n)     # in sorted order
    visited = Ref(0)            # node counter
    choice = falses(n)          # current partial choice (sorted order)

    @inline function ubound(idx::Int, value::Float64, cost::Float64)
        if cost > Bf
            return -Inf
        end
        cap = Bf - cost
        ub = value
        @inbounds for j in idx:n
            if cs[j] <= cap + 1e-12
                ub += rs[j]
                cap -= cs[j]
            else
                ub += rs[j] * (cap / cs[j])     # fractional "fill" for bound
                break
            end
        end
        return ub
    end

    function dfs(idx::Int, value::Float64, cost::Float64)
        visited[] += 1
        if idx > n
            if cost <= Bf && value > best[]
                best[] = value
                best_choice .= choice
            end
            return
        end
        # Prune if even the optimistic bound cannot beat current best.
        if ubound(idx, value, cost) <= best[] + 1e-12
            return
        end
        # Branch: include item idx
        choice[idx] = true
        dfs(idx + 1, value + rs[idx], cost + cs[idx])
        # Branch: exclude item idx
        choice[idx] = false
        dfs(idx + 1, value, cost)
    end

    dfs(1, 0.0, 0.0)

    # Map best_choice back to the original indexing
    selected = falses(n)
    @inbounds for (k, i) in pairs(order)
        selected[i] = best_choice[k]
    end
    return best[], selected, visited[]
end

# Small demonstration: brute force is feasible at n ≈ 20; for n ≈ 40 we skip brute force.
let
    rng = MersenneTwister(1234)

    # Small instance where brute force is still possible
    n_small = 20
    r, c, B = synth_instance(rng, n_small; budget_ratio=0.35)

    bf_val, bf_sel, bf_visited = brute_force_knapsack(r, c, B)
    bb_val, bb_sel, bb_visited = branch_and_bound_knapsack(r, c, B)
    gr_val, gr_sel = greedy_knapsack(r, c, B)

    println("n = $n_small")
    println(" total subsets = ", 1 << n_small)                 # 1,048,576
    println(" brute force visited = ", bf_visited,
        ", optimal value = ", bf_val)
    println(" branch-and-bound visited = ", bb_visited,
        ", optimal value = ", bb_val)
    println(" greedy value = ", gr_val,
        ", optimality gap = ", bf_val - gr_val)

    # Larger instance—note how the search space explodes (2^40 ≈ 10^12)
    n_large = 40
    r2, c2, B2 = synth_instance(rng, n_large; budget_ratio=0.35)
    bb2_val, bb2_sel, bb2_visited = branch_and_bound_knapsack(r2, c2, B2)
    gr2_val, gr2_sel = greedy_knapsack(r2, c2, B2)

    println("\nn = $n_large")
    println(" total subsets = ", BigInt(1) << n_large)         # ≈ 1.1×10^12
    println(" branch-and-bound visited = ", bb2_visited,
        ", optimal value = ", bb2_val)
    println(" greedy value = ", gr2_val,
        ", optimality gap = ", bb2_val - gr2_val)
end

n = 20
 total subsets = 1048576
 brute force visited = 1048576, optimal value = 0.7605388597519076
 branch-and-bound visited = 317, optimal value = 0.7605388597519077
 greedy value = 0.703897494058528, optimality gap = 0.05664136569337963

n = 40
 total subsets = 1099511627776
 branch-and-bound visited = 115, optimal value = 1.4909468614490333
 greedy value = 1.4909468614490333, optimality gap = 0.0

What this shows in practice:

Brute force scales as \(2^n\). At \(n=20\), there are \(1{,}048{,}576\) subsets—still manageable. At \(n=40\), there are \(\approx 10^{12}\) subsets—completely infeasible to enumerate.
Greedy is fast and often near‑optimal, especially when ratios \(r_i/c_i\) align with the true optimum. However, it can be arbitrarily sub‑optimal in adversarial cases.
Branch‑and‑bound exploits structure. The fractional‑knapsack bound is optimistic—it assumes you could take fractions of remaining items—so any subtree whose bound cannot beat the current best is safely pruned. Worst‑case complexity remains exponential, but typical search can drop from billions of nodes to thousands or millions.

Financial Modeling Pro Tip

In realistic portfolio construction, practitioners frequently avoid pure subset selection by relaxing to continuous weights (e.g., mean–variance) or by formulating a mixed‑integer model and using modern MILP solvers. These solvers implement sophisticated branch‑and‑bound/branch‑and‑cut strategies, presolve, and cutting planes. When exact optimality is not essential, heuristics (greedy, local search, genetic/evolutionary methods) can deliver high‑quality portfolios quickly under operational constraints (cardinality, sector caps, turnover budgets).

See Chapter 27 for more discussion.

This topic will be covered in Chapter 14.↩︎
“Big-O”, so named because the ‘O’ denotes the order of growth, as in \(O(1)\). \(O(n)\), etc.↩︎
The Traveling Salesperson Problem is a classic computer science problem where you need to find the shortest possible route that visits each city in a given set exactly once and returns to the starting city. It’s a seemingly simple problem that becomes computationally intensive very quickly as the number of cities increases.↩︎
In practice, the operating system may have already allocated space for an array that’s larger than what the program is actually using so far, so this step may be ‘quick’ at times, while other times the operating system may actually need to extend the block of memory allocated to the array.↩︎