25  Stochastic Mortality Projections

Alec Loudenback

25.1 Chapter Overview

A term life insurance policy is used to illustrate: selecting key model features, design tradeoffs between a few different approaches, and a discussion of the performance impacts of the different approaches to parallelism.

25.2 Introduction

Monte Carlo simulation is common in risk and valuation contexts. This worked example will create a population of term life insurance policies and simulate the associated claims stochastically. For this chapter, the focus is not so much on the outcomes of the model, but instead how and why the model was chosen to be setup in the way that it is.

The general structure of the example is:

  1. Define the datatypes and sample data
  2. Define the core logic that governs the projected outcomes for the modeled policies
  3. Evaluate a few ways to structure the simulation, including:
  4. allocating and non-allocating approaches
  5. single threaded and mutli-threaded approaches

As will be shown, the number of simulations able to be completed on modern CPU hardware is really remarkable!

using CSV, DataFrames
using MortalityTables, FinanceCore
using Dates
using ThreadsX
using BenchmarkTools
using Random
using CairoMakie

25.3 Data and Types

25.3.1 @enums and the Policy Type

Our core unit of simulation with be a single life insurance Policy. Important characteristics include: the age a policy was issued at, the sex of the insured, and risk class to determine the assumed mortality rate. To make the example more realistic and demonstrate how it might look for a real block of inforce policies, additional fields have been included such as ID (not really used, but a common identifier) and COLA which is a cost-of-living-adjustment, used to modify the policy benefit through time. To be clear: the Policy type has more fields than will actually be used in the calculations, with the purpose to show how typical fields used in practice might be defined.

Before we define the core Policy type, there’s a couple of types we might consider defining that would subsequently get used by Policy: types representing sex and risk class. A typical approach might be to simply define associated types, like this:

abstract type Sex end

struct Male <: Sex end
struct Female <: Sex end

Then, if we were to include a sex field in Policy, we could write it like this:

struct Policy
    # ... 
    sex::Sex
    # ...
end

This would be a totally valid and logical approach! However, high performance is a top priority for this simulation, and therefore this approach would sacrifice being able to keep Policy data on the stack instead of the heap. This is because Sex is of unknown concrete type, which could be as simple as our definition above (with no fields in Male and Female) or someone could add a new Alien subtype of Sex with a number of associated data fields! This is why Julia cannot assume that subtypes of Sex will always be a simple Singletons with no associated data.

Instead, we can utilize Enums, which are a sort of lightweight type where the only thing that matters with it is distinguishing between categories. Enums in Base Julia are basically a set of constants grouped together that reference an associated integer.

@enum Sex Female = 1 Male = 2
@enum Risk Standard = 1 Preferred = 2
@enum PaymentMode Annual = 1 Quarterly = 4 Monthly = 12

Enums are convenient because it lets us use human-meaningful names for integer-based categories. Julia will also keep track of valid options: we cannot now use anything other than Female or Male where we have said a Sex must be specified.

Note

There exist Julia packages which are more powerful versions of Enums, essentially leaning into the ability to use the type system instead of just nice names for categorical variables.

Moving on to the definition of the policy itself, here’s what that looks like. Note that every field has a type annotation associated with it.

struct Policy
    id::Int
    sex::Sex
    benefit_base::Float64
    COLA::Float64
    mode::PaymentMode
    issue_date::Date
    issue_age::Int
    risk::Risk
end

The benefit of the way we have defined it here, using simple bits-types for each field is that our new composite Policy type is also a bitstype:

let
    p = Policy(1, Male, 1e6, 0.02, Annual, today(), 50, Standard)

    isbits(p)
end
true
Note

Type annotations are optional, but providing them is able to coerce the values to be all plain bits (i.e. simple, non-referenced values like arrays are) when the type is constructed. We could have defined Policy without the types specified:

struct Policy
    id
    sex
    ...
    risk
end

Leaving out the annotations here forces Julia to assume that Any type could be used for the given field. Having the field be of type Any makes the instantiated struct data be stored in the heap, since Julia can’t know the size of Policy in bits in advance.

We would also find that the un-annotated type is about 50 times slower than the one with annotation due to the need to utilize runtime lookup and reference memory on the heap instead of the stack.

25.3.2 The Data

To partially illustrate a common workflow, we’ll pretend that the data we are interested in comes from a CSV file, which will be defined inline using an IOBuffer so that the structure of the source data is clear to the reader. Only two policies will be listed for brevity, but we will duplicate them for simulation purposes later on.

sample_csv_data =
    IOBuffer(
        raw"id,sex,benefit_base,COLA,mode,issue_date,issue_age,risk
         1,M,100000.0,0.03,12,1999-12-05,30,Std
         2,F,200000.0,0.03,12,1999-12-05,30,Pref"
    )
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=152, maxsize=Inf, ptr=1, mark=-1)

We will not load the sample data using a common pattern:

  1. Load the source file into a DataFrame
  2. map over each row of the dataframe, and return an instantiated Policy object
  3. Within the map, apply basic data parsing and translation logic as needed.
policies = let

    # read CSV directly into a dataframe
1    df = CSV.read(sample_csv_data, DataFrame)

    # map over each row and construct an array of Policy objects
    map(eachrow(df)) do row
        Policy(
            row.id,
            row.sex == "M" ? Male : Female,
            row.benefit_base,
            row.COLA,
            PaymentMode(row.mode),
            row.issue_date,
            row.issue_age,
            row.risk == "Std" ? Standard : Preferred,
        )
    end


end
1
CSV.read("sample_inforce.csv",DataFrame) could be used if the data really was in a CSV file named sample_inforce.csv instead of our demonstration IOBuffer.
2-element Vector{Policy}:
 Policy(1, Male, 100000.0, 0.03, Monthly, Date("1999-12-05"), 30, Standard)
 Policy(2, Female, 200000.0, 0.03, Monthly, Date("1999-12-05"), 30, Preferred)

25.4 Model Assumptions

25.4.1 Mortality Assumption

MortalityTables.jl provides common life insurance industry tables, and we will use two tables: one each for male and female policies respectively.

mort = Dict(
1    Male => MortalityTables.table(988).ultimate,
    Female => MortalityTables.table(992).ultimate,
)

function mortality(pol::Policy, params)
    return params.mortality[pol.sex]
end
1
ultimate refers to not differentiating the mortality by a ‘select’ underwriting period, which is common but unnecessary for the demonstration in this chapter.
mortality (generic function with 1 method)

25.5 Model Structure and Mechanics

25.5.1 Core Model Behavior

The overall flow of the model loop will be as follows:

  1. Determine some initialized values for each policy at the start of the projection.
  2. Step through annual timesteps and simulate whether a death has occurred.
    1. If a death has occurred, log the benefit paid out.
    2. If a death has not occurred keep track of the remaining lives inforce and increment policy values.

The code is shown first and then discussion will follow it:

function pol_project!(out, policy, params)
    # some starting values for the given policy
    dur = length(policy.issue_date:Year(1):params.val_date) + 1
    start_age = policy.issue_age + dur - 1
    COLA_factor = (1 + policy.COLA)
    cur_benefit = policy.benefit_base * COLA_factor^(dur - 1)

    # get the right mortality vector
    qs = mortality(policy, params)

    # grab the current thread's id to write to results container 
    # without conflicting with other threads
    tid = Threads.threadid()

    ω = lastindex(qs)

1    @inbounds for t in 1:min(params.proj_length, ω - start_age)

        q = qs[start_age+t] # get current mortality

        if (rand() < q)
            # if dead then just return and don't increment the results anymore
2            out.benefits[t, tid] += cur_benefit
            return
        else
            # pay benefit, add a life to the output count, and increment the benefit for next year
            out.lives[t, tid] += 1
            cur_benefit *= COLA_factor
        end
    end
end
1
inbounds turns off bounds-checking, which makes hot loops faster but first write loop without it to ensure you don’t create an error (will crash if you have the error without bounds checking)
2
Note that the loop is iterating down a column (i.e. across rows) for efficiency (since Julia is column-major).
pol_project! (generic function with 1 method)

25.5.2 Inputs and Outputs

25.5.2.1 Inputs

The general approach for non-allocating model runs is to provide a previously instantiated container for the function to write the results to. Here, the incoming argument out(put) will be a named tuple with associated matrices as the lives and benefits fields. We know how many periods the model will simulate for and can therefore size the array appropriately at creation.

Other inputs include: params which defines some global-like parameters and policy which is a single Policy object.

Note

Note that the unit of the core model logic is a single policy. This simplifies the logic and reduces the chance for error due to needing to code for entire arrays of policies at a single time (as would be the case for array oriented programming style, as described in Section 6.5).

25.5.3 Threading

The simulations is using a threaded parallelism approach where it could be operating on any of the computer’s available threads. Multi-processor (multi-machine) or GPU-based computation would require some modifications see (Chapter 11). For the scale and complexity of this example, thread-based parallelism on a single CPU is all one should need for compute.

The threads are handled by distributed the work across threads (this is done by ThreadsX in the foreach loop below), but we need to write the appropriate place in the matrix so that threads do not compete for the same column in the output data. Therefore, when we create the lives and benefits matrices we need to have them sized so that the number of rows is the number of projection periods and the number of columns is the number of threads.

25.5.4 Simulation Control

Parameters for our projection:

params = (
    val_date=Date(2021, 12, 31),
    proj_length=100,
    mortality=mort,
)
(val_date = Date("2021-12-31"), proj_length = 100, mortality = Dict{Sex, OffsetArrays.OffsetVector{Float64, Vector{Float64}}}(Male => [0.022571, 0.022571, 0.022571, 0.022571, 0.022571, 0.022571, 0.022571, 0.022571, 0.022571, 0.022571  …  0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4], Female => [0.00745, 0.00745, 0.00745, 0.00745, 0.00745, 0.00745, 0.00745, 0.00745, 0.00745, 0.00745  …  0.376246, 0.386015, 0.393507, 0.398308, 0.4, 0.4, 0.4, 0.4, 0.4, 1.0]))

Having defined the model behavior at the unit of the policy above, we now need to define how the model should iterate over the entire population of interest. Given a vector of Policys in the policies argument, the project function will:

  1. Create output containers, keeping in mind the projection length and number of threads being used.
  2. Loop over each policy, letting ThreadsX.foreach distribute the work across different threads.
  3. Sum up the results across threads via reduce.
function project(policies, params)
    threads = Threads.nthreads()
    benefits = zeros(params.proj_length, threads)
    lives = zeros(Int, params.proj_length, threads)
    out = (; benefits, lives)
    ThreadsX.foreach(policies) do pol
        pol_project!(out, pol, params)
    end
1    map(x -> vec(reduce(+, x, dims=2)), out)
end
1
vec turns the result into a 1D vector instead of a 1D matrix for later convenience.
project (generic function with 1 method)

25.6 Running the projection

Example of a single projection:

project(repeat(policies, 100_000), params) # <!>
(benefits = [1.2964489791590672e9, 1.3534343160813382e9, 1.4442880158754828e9, 1.533336391223668e9, 1.5520146281918643e9, 1.6557732589310603e9, 1.7066247394519851e9, 1.801756932364091e9, 1.8573096885422015e9, 1.9364622322752106e9  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [195007, 189943, 184755, 179466, 174255, 168905, 163591, 158182, 152853, 147490  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
  1. repeat creates a vector that repeats the two demonstration policies many times.

25.6.1 Stochastic Projection

This defines a loop to calculate the results n times (this is only running the two policies in the sample data n times). This is emulating running our population of policies through \(n\) stochastic scenarios, similar to what might be done for a risk or pricing exercise.

function stochastic_proj(policies, params, n)

    ThreadsX.map(1:n) do i
        project(policies, params)
    end
end
stochastic_proj (generic function with 1 method)

25.6.1.1 Demonstration

We’ll simulate the two policies’ outcomes 1,000 times and visualize the resulting distribution of claims value:

stoch = stochastic_proj(policies, params, 1000)
1000-element Vector{@NamedTuple{benefits::Vector{Float64}, lives::Vector{Int64}}}:
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 471313.1012018762, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 1, 1, 1, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 257508.2755685113  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 235656.5506009381, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 1, 1, 1, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 222128.90055701582, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 1, 1, 1, 1, 1, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 ⋮
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 215659.12675438428, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 1, 1, 1, 1, 1, 1, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [197358.6511126606, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
 (benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 457585.5351474526, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 1, 1, 1, 1, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
let
    v = [pv(0.03, s.benefits) for s in stoch]
    hist(v,
        bins=15,
        axis=(
            xlabel="Present Value of Benefits",
            ylabel="Number of scenarios"
        )
    )
end

25.7 Benchmarking

Using a 2024 Macbook Air M3 laptop, about 45 million policies able to be stochastically projected per second:

policies_to_benchmark = 4_500_000
# adjust the `repeat` depending on how many policies are already in the array
# to match the target number for the benchmark
n = policies_to_benchmark ÷ length(policies)

@benchmark project(p, r) setup = (p = repeat($policies, $n); r = $params)
BenchmarkTools.Trial: 57 samples with 1 evaluation per sample.
 Range (minmax):  64.237 ms 69.369 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     64.476 ms                GC (median):    0.00%
 Time  (mean ± σ):   64.688 ms ± 825.622 μs   GC (mean ± σ):  0.00% ± 0.00%

  ▆██                                                      
  ███▆▇█▇▆▇▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▄▄▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▄ ▁
  64.2 ms         Histogram: frequency by time           67 ms <

 Memory estimate: 69.09 KiB, allocs estimate: 504.

25.8 Conclusion

This example has worked through a recommended pattern of setting up and running a stochastic simulation using a threaded approach to parallelism. The results show that quite a bit of simulation power is available using even consumer laptop hardware!