using CSV, DataFrames
using MortalityTables, FinanceCore
using Dates
using ThreadsX
using BenchmarkTools
using Random
using CairoMakie
25 Stochastic Mortality Projections
Alec Loudenback
25.1 Chapter Overview
A term life insurance policy is used to illustrate: selecting key model features, design tradeoffs between a few different approaches, and a discussion of the performance impacts of the different approaches to parallelism.
25.2 Introduction
Monte Carlo simulation is common in risk and valuation contexts. This worked example will create a population of term life insurance policies and simulate the associated claims stochastically. For this chapter, the focus is not so much on the outcomes of the model, but instead how and why the model was chosen to be setup in the way that it is.
The general structure of the example is:
- Define the datatypes and sample data
- Define the core logic that governs the projected outcomes for the modeled policies
- Evaluate a few ways to structure the simulation, including:
- allocating and non-allocating approaches
- single threaded and mutli-threaded approaches
As will be shown, the number of simulations able to be completed on modern CPU hardware is really remarkable!
25.3 Data and Types
25.3.1 @enums
and the Policy
Type
Our core unit of simulation with be a single life insurance Policy
. Important characteristics include: the age a policy was issued at, the sex of the insured, and risk class to determine the assumed mortality rate. To make the example more realistic and demonstrate how it might look for a real block of inforce policies, additional fields have been included such as ID (not really used, but a common identifier) and COLA
which is a cost-of-living-adjustment, used to modify the policy benefit through time. To be clear: the Policy
type has more fields than will actually be used in the calculations, with the purpose to show how typical fields used in practice might be defined.
Before we define the core Policy
type, there’s a couple of types we might consider defining that would subsequently get used by Policy
: types representing sex and risk class. A typical approach might be to simply define associated types, like this:
abstract type Sex end
struct Male <: Sex end
struct Female <: Sex end
Then, if we were to include a sex field in Policy, we could write it like this:
struct Policy
# ...
::Sex
sex# ...
end
This would be a totally valid and logical approach! However, high performance is a top priority for this simulation, and therefore this approach would sacrifice being able to keep Policy
data on the stack instead of the heap. This is because Sex
is of unknown concrete type, which could be as simple as our definition above (with no fields in Male
and Female
) or someone could add a new Alien
subtype of Sex
with a number of associated data fields! This is why Julia cannot assume that subtypes of Sex
will always be a simple Singletons with no associated data.
Instead, we can utilize Enums, which are a sort of lightweight type where the only thing that matters with it is distinguishing between categories. Enums in Base Julia are basically a set of constants grouped together that reference an associated integer.
@enum Sex Female = 1 Male = 2
@enum Risk Standard = 1 Preferred = 2
@enum PaymentMode Annual = 1 Quarterly = 4 Monthly = 12
Enums are convenient because it lets us use human-meaningful names for integer-based categories. Julia will also keep track of valid options: we cannot now use anything other than Female
or Male
where we have said a Sex
must be specified.
There exist Julia packages which are more powerful versions of Enums, essentially leaning into the ability to use the type system instead of just nice names for categorical variables.
Moving on to the definition of the policy itself, here’s what that looks like. Note that every field has a type annotation associated with it.
struct Policy
::Int
id::Sex
sex::Float64
benefit_base::Float64
COLA::PaymentMode
mode::Date
issue_date::Int
issue_age::Risk
riskend
The benefit of the way we have defined it here, using simple bits-types for each field is that our new composite Policy
type is also a bitstype:
let
= Policy(1, Male, 1e6, 0.02, Annual, today(), 50, Standard)
p
isbits(p)
end
true
Type annotations are optional, but providing them is able to coerce the values to be all plain bits (i.e. simple, non-referenced values like arrays are) when the type is constructed. We could have defined Policy
without the types specified:
struct Policy
id
sex...
riskend
Leaving out the annotations here forces Julia to assume that Any
type could be used for the given field. Having the field be of type Any
makes the instantiated struct data be stored in the heap, since Julia can’t know the size of Policy
in bits in advance.
We would also find that the un-annotated type is about 50 times slower than the one with annotation due to the need to utilize runtime lookup and reference memory on the heap instead of the stack.
25.3.2 The Data
To partially illustrate a common workflow, we’ll pretend that the data we are interested in comes from a CSV file, which will be defined inline using an IOBuffer
so that the structure of the source data is clear to the reader. Only two policies will be listed for brevity, but we will duplicate them for simulation purposes later on.
=
sample_csv_data IOBuffer(
raw"id,sex,benefit_base,COLA,mode,issue_date,issue_age,risk
1,M,100000.0,0.03,12,1999-12-05,30,Std
2,F,200000.0,0.03,12,1999-12-05,30,Pref"
)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=152, maxsize=Inf, ptr=1, mark=-1)
We will not load the sample data using a common pattern:
- Load the source file into a
DataFrame
map
over each row of the dataframe, and return an instantiatedPolicy
object- Within the map, apply basic data parsing and translation logic as needed.
= let
policies
# read CSV directly into a dataframe
1= CSV.read(sample_csv_data, DataFrame)
df
# map over each row and construct an array of Policy objects
map(eachrow(df)) do row
Policy(
row.id,== "M" ? Male : Female,
row.sex
row.benefit_base,
row.COLA,PaymentMode(row.mode),
row.issue_date,
row.issue_age,== "Std" ? Standard : Preferred,
row.risk
)end
end
- 1
-
CSV.read("sample_inforce.csv",DataFrame)
could be used if the data really was in a CSV file namedsample_inforce.csv
instead of our demonstrationIOBuffer
.
2-element Vector{Policy}:
Policy(1, Male, 100000.0, 0.03, Monthly, Date("1999-12-05"), 30, Standard)
Policy(2, Female, 200000.0, 0.03, Monthly, Date("1999-12-05"), 30, Preferred)
25.4 Model Assumptions
25.4.1 Mortality Assumption
MortalityTables.jl provides common life insurance industry tables, and we will use two tables: one each for male and female policies respectively.
= Dict(
mort 1=> MortalityTables.table(988).ultimate,
Male => MortalityTables.table(992).ultimate,
Female
)
function mortality(pol::Policy, params)
return params.mortality[pol.sex]
end
- 1
-
ultimate
refers to not differentiating the mortality by a ‘select’ underwriting period, which is common but unnecessary for the demonstration in this chapter.
mortality (generic function with 1 method)
25.5 Model Structure and Mechanics
25.5.1 Core Model Behavior
The overall flow of the model loop will be as follows:
- Determine some initialized values for each policy at the start of the projection.
- Step through annual timesteps and simulate whether a death has occurred.
- If a death has occurred, log the benefit paid out.
- If a death has not occurred keep track of the remaining lives inforce and increment policy values.
The code is shown first and then discussion will follow it:
function pol_project!(out, policy, params)
# some starting values for the given policy
= length(policy.issue_date:Year(1):params.val_date) + 1
dur = policy.issue_age + dur - 1
start_age = (1 + policy.COLA)
COLA_factor = policy.benefit_base * COLA_factor^(dur - 1)
cur_benefit
# get the right mortality vector
= mortality(policy, params)
qs
# grab the current thread's id to write to results container
# without conflicting with other threads
= Threads.threadid()
tid
= lastindex(qs)
ω
1@inbounds for t in 1:min(params.proj_length, ω - start_age)
= qs[start_age+t] # get current mortality
q
if (rand() < q)
# if dead then just return and don't increment the results anymore
2+= cur_benefit
out.benefits[t, tid] return
else
# pay benefit, add a life to the output count, and increment the benefit for next year
+= 1
out.lives[t, tid] *= COLA_factor
cur_benefit end
end
end
- 1
-
inbounds
turns off bounds-checking, which makes hot loops faster but first write loop without it to ensure you don’t create an error (will crash if you have the error without bounds checking) - 2
- Note that the loop is iterating down a column (i.e. across rows) for efficiency (since Julia is column-major).
pol_project! (generic function with 1 method)
25.5.2 Inputs and Outputs
25.5.2.1 Inputs
The general approach for non-allocating model runs is to provide a previously instantiated container for the function to write the results to. Here, the incoming argument out
(put) will be a named tuple with associated matrices as the lives
and benefits
fields. We know how many periods the model will simulate for and can therefore size the array appropriately at creation.
Other inputs include: params
which defines some global-like parameters and policy
which is a single Policy
object.
Note that the unit of the core model logic is a single policy. This simplifies the logic and reduces the chance for error due to needing to code for entire arrays of policies at a single time (as would be the case for array oriented programming style, as described in Section 6.5).
25.5.3 Threading
The simulations is using a threaded parallelism approach where it could be operating on any of the computer’s available threads. Multi-processor (multi-machine) or GPU-based computation would require some modifications see (Chapter 11). For the scale and complexity of this example, thread-based parallelism on a single CPU is all one should need for compute.
The threads are handled by distributed the work across threads (this is done by ThreadsX
in the foreach
loop below), but we need to write the appropriate place in the matrix so that threads do not compete for the same column in the output data. Therefore, when we create the lives
and benefits
matrices we need to have them sized so that the number of rows is the number of projection periods and the number of columns is the number of threads.
25.5.4 Simulation Control
Parameters for our projection:
= (
params =Date(2021, 12, 31),
val_date=100,
proj_length=mort,
mortality )
(val_date = Date("2021-12-31"), proj_length = 100, mortality = Dict{Sex, OffsetArrays.OffsetVector{Float64, Vector{Float64}}}(Male => [0.022571, 0.022571, 0.022571, 0.022571, 0.022571, 0.022571, 0.022571, 0.022571, 0.022571, 0.022571 … 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4], Female => [0.00745, 0.00745, 0.00745, 0.00745, 0.00745, 0.00745, 0.00745, 0.00745, 0.00745, 0.00745 … 0.376246, 0.386015, 0.393507, 0.398308, 0.4, 0.4, 0.4, 0.4, 0.4, 1.0]))
Having defined the model behavior at the unit of the policy above, we now need to define how the model should iterate over the entire population of interest. Given a vector of Policy
s in the policies
argument, the project
function will:
- Create output containers, keeping in mind the projection length and number of threads being used.
- Loop over each policy, letting
ThreadsX.foreach
distribute the work across different threads. - Sum up the results across threads via
reduce
.
function project(policies, params)
= Threads.nthreads()
threads = zeros(params.proj_length, threads)
benefits = zeros(Int, params.proj_length, threads)
lives = (; benefits, lives)
out foreach(policies) do pol
ThreadsX.pol_project!(out, pol, params)
end
1map(x -> vec(reduce(+, x, dims=2)), out)
end
- 1
-
vec
turns the result into a 1D vector instead of a 1D matrix for later convenience.
project (generic function with 1 method)
25.6 Running the projection
Example of a single projection:
project(repeat(policies, 100_000), params) # <!>
(benefits = [1.2964489791590672e9, 1.3534343160813382e9, 1.4442880158754828e9, 1.533336391223668e9, 1.5520146281918643e9, 1.6557732589310603e9, 1.7066247394519851e9, 1.801756932364091e9, 1.8573096885422015e9, 1.9364622322752106e9 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [195007, 189943, 184755, 179466, 174255, 168905, 163591, 158182, 152853, 147490 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
repeat
creates a vector that repeats the two demonstration policies many times.
25.6.1 Stochastic Projection
This defines a loop to calculate the results n
times (this is only running the two policies in the sample data n
times). This is emulating running our population of policies through \(n\) stochastic scenarios, similar to what might be done for a risk or pricing exercise.
function stochastic_proj(policies, params, n)
map(1:n) do i
ThreadsX.project(policies, params)
end
end
stochastic_proj (generic function with 1 method)
25.6.1.1 Demonstration
We’ll simulate the two policies’ outcomes 1,000 times and visualize the resulting distribution of claims value:
= stochastic_proj(policies, params, 1000) stoch
1000-element Vector{@NamedTuple{benefits::Vector{Float64}, lives::Vector{Int64}}}:
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 471313.1012018762, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 1, 1, 1, 1 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 257508.2755685113 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 1 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 235656.5506009381, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 1, 1, 1, 1 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 222128.90055701582, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 1, 1, 1, 1, 1, 1 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
⋮
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 215659.12675438428, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 1, 1, 1, 1, 1, 1, 1 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [197358.6511126606, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
(benefits = [0.0, 0.0, 0.0, 0.0, 0.0, 457585.5351474526, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], lives = [2, 2, 2, 2, 2, 1, 1, 1, 1, 1 … 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
let
= [pv(0.03, s.benefits) for s in stoch]
v hist(v,
=15,
bins=(
axis="Present Value of Benefits",
xlabel="Number of scenarios"
ylabel
)
)end
25.7 Benchmarking
Using a 2024 Macbook Air M3 laptop, about 45 million policies able to be stochastically projected per second:
= 4_500_000
policies_to_benchmark # adjust the `repeat` depending on how many policies are already in the array
# to match the target number for the benchmark
= policies_to_benchmark ÷ length(policies)
n
@benchmark project(p, r) setup = (p = repeat($policies, $n); r = $params)
BenchmarkTools.Trial: 57 samples with 1 evaluation per sample. Range (min … max): 64.237 ms … 69.369 ms ┊ GC (min … max): 0.00% … 0.00% Time (median): 64.476 ms ┊ GC (median): 0.00% Time (mean ± σ): 64.688 ms ± 825.622 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▆██ ▁▄ ███▆▇██▇▆▇▇▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▄▄▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▄ ▁ 64.2 ms Histogram: frequency by time 67 ms < Memory estimate: 69.09 KiB, allocs estimate: 504.
25.8 Conclusion
This example has worked through a recommended pattern of setting up and running a stochastic simulation using a threaded approach to parallelism. The results show that quite a bit of simulation power is available using even consumer laptop hardware!