32  More Useful Techniques

Author

Alec Loudenback

32.1 Chapter Overview

Other useful techniques are surveyed, such as: memoization to avoid repeated computations, pseudo–Monte Carlo, creating a model office, and tips on modeling a complete balance sheet. Also covered are elements of practical review such as static and dynamic validations, implied rate analysis, and explanatory vs. predictive modeling considerations.

32.2 Conceptual Techniques

32.2.1 Taking Things to the Extreme

Consider what happens if something is taken to an extreme. For example, what happens in the model if we input negative rates? Where should negative rates be allowed and can the model handle them?

32.2.2 Range Bounding

Sometimes you just need to know that an outcome is within a certain range - if you can develop a “high” and “low” estimate by making assumptions that you know are outside of feasible ranges, then you can determine whether something is reasonable or within tolerances.

To take an example from the pages of interview questions: say you need to determine if a mortgaged property’s value is greater than the amount of the outstanding loan (say $100,000). You don’t have an appraisal, but know that it’s in reasonable condition and that (1) a comparable house with many more issues sold for $100 per square foot. You also don’t know the square footage of the house, but know from the number of rooms and layout that it must be at least 1000 square feet. Therefore you know that the value should at least be greater than:

\[ \frac{\$100}{\text{sq. ft}} \times 1000 \text{sq. ft} = \$100,000 \]

We’d then conclude that the value of the house very likely exceeds the outstanding balance of the loan and resolves our query without complex modeling or expensive appraisals.

32.3 Modeling Techniques

32.3.1 Serialization

Serialization is the process of converting a data structure or object state into a format that can be easily stored or transmitted, allowing it to be reconstructed later.

In most finance workflows, the slowest parts are not the regressions themselves—they are the data prep, calibration, and scenario generation steps that lead up to them. If you are pricing thousands of scenarios, rolling a model office forward month-by-month, or recalibrating prepayment and default curves, rerunning those steps on every iteration wastes time and money and complicates audits. Serialization lets you checkpoint those expensive steps and ship lightweight “artifacts” between notebooks, jobs, and environments.

Tip

Why serialize? - Speed and cost: Avoid recomputing expensive steps (e.g., yield curve calibration, Monte Carlo paths) between runs. - Reproducibility and audit: Persist a snapshot of the “model office” (data, parameters, code version, random seeds) so results can be reproduced for validation and regulators. - Deployment: Move model artifacts between environments in a controlled, versioned way.

What to use when:

Format/Tool Pros Cons Typical use
Serialization stdlib (.jls) Fast, no extra dependency, preserves Julia types Not stable across Julia versions; Julia-only Short-lived caches, memoization artifacts
JLD2 (.jld2) Portable binary, stores multiple named arrays/structs, widely used Extra dependency; still Julia-focused Persisting model states and results across sessions/machines
Arrow/Parquet Language-agnostic, columnar, efficient for large tables Heavier dependency; not for arbitrary Julia structs Large tabular market/position data for interop
CSV/JSON/TOML Human-readable, easy diffing/versioning Larger files, slower, lossy for binary data Configs, small tables, metadata sidecars

32.3.1.1 Serialization Principles

Design principles:

  • Minimality: Save just enough to reproduce downstream results (parameters, seeds, small derived tables), not entire raw datasets unless necessary.
  • Determinism: Include the random seed and any non-default options so recomputation is bit-for-bit identical when needed.
  • Portability: Prefer concrete, serializable types (structs, arrays, Dict) and stable formats when artifacts will live across Julia versions or be shared with others.
  • Traceability: Attach metadata (model version, code commit, created_at, inputs’ file hashes) so an auditor or colleague can answer “what produced this file?” a year later.

What to serialize vs. recompute:

Serialize: fitted parameters, calibrated curves, scenario indexes, precomputed shocks, and intermediate aggregates that are expensive but compact. - Recompute: anything cheap, or large raw inputs you can reload from a columnar format (Arrow/Parquet). - Reference big inputs by path and hash in the artifact’s metadata rather than embedding them.

Operational guidance:

  • Version your artifacts: embed minimal metadata (e.g., julia_version, created_at, model_version) and, if possible, a git commit hash for traceability.
  • Keep configs human-readable: store run configuration in JSON/TOML and reference it from binary artifacts.
  • Separate data from models: store large tabular datasets in Arrow/Parquet; store small model objects/results in JLD2/Serialization.
  • Sensitive data: never serialize secrets. Encrypt at rest if files contain PII; control access with OS permissions.
  • Interop: do not deserialize untrusted files. Prefer Arrow/Parquet/CSV for sharing with non-Julia systems.

32.3.1.2 Example: Snapshot a “Model Office” State

Capture parameters, fitted coefficients, seeds, and minimal metadata. Prefer concrete, serializable structs and plain arrays to keep files portable.

using Dates, Serialization

struct ModelState
    θ::Vector{Float64}        # fitted parameters (example)
    seed::Int64              # RNG seed used for the run
    timestamp::DateTime       # when the snapshot was created
    note::String              # short description
end

# Atomic write to avoid half-written files
function atomic_serialize(path::AbstractString, obj)
    dir = dirname(path)
    mkpath(dir)
    tmp = tempname(dir)
    serialize(tmp, obj)
    mv(tmp, path; force=true)
    return path
end

# Example: save/load a state
θ = [1.0, 2.0]                      # pretend these were estimated
state = ModelState(θ, 42, now(), "OLS on 2025-08-11")

path = joinpath("artifacts", "model_state.jls")
atomic_serialize(path, state)

restored = deserialize(path)
ModelState([1.0, 2.0], 42, DateTime("2025-08-18T21:21:38.316"), "OLS on 2025-08-11")

Tip: Keep snapshot files small and focused. Store big inputs (e.g., loan-level data) separately in efficient tabular formats and reference them via metadata (e.g., file hashes/paths) in the snapshot.

32.3.1.3 Example: Cross-session persistence with JLD2

  • JLD2 stores multiple named variables in one file and is less brittle across Julia versions than raw Serialization. Good default for sharing artifacts with colleagues.
using JLD2, Random, LinearAlgebra, Dates

X = hcat(ones(100), rand(100))
y = X * [1.0, 2.0] .+ 0.1 .* randn(100)
θ = X \ y

meta = (
    julia_version=string(VERSION),
    created_at=string(Dates.now()),
    description="OLS fit for prepayment speed model (toy example)",
)

mkpath("artifacts")
file = joinpath("artifacts", "ols_artifact_v2.jld2")
jldsave(file; θ, meta, X_size=size(X))

# Load (returns a tuple in the same order)
θ2, meta2, Xsz2 = JLD2.load(file, "θ", "meta", "X_size")

@assert Xsz2 == (100, 2)
@assert length2) == 2

32.3.1.4 Example: Disk-Backed Memoization (Cache Expensive Results)

  • Cache outputs keyed by inputs to avoid re-running slow steps (e.g., pricing a large scenario set). Include a label and serialize atomically.
using SHA, Serialization

# Build a stable cache key from a label and arguments
function cachekey(label::AbstractString, args...; kwargs...)
    io = IOBuffer()
    print(io, label, '|', args, '|', kwargs)
    return bytes2hex(sha1(take!(io)))
end

function memoize_to_disk(f; label::AbstractString="f", cache_dir::AbstractString="cache")
    mkpath(cache_dir)
    return function (args...; kwargs...)
        key = cachekey(label, args...; kwargs...)
        path = joinpath(cache_dir, string(key, ".jls"))
        if isfile(path)
            return deserialize(path)
        else
            res = f(args...; kwargs...)
            # Atomic write
            tmp = tempname(cache_dir)
            serialize(tmp, res)
            mv(tmp, path; force=true)
            return res
        end
    end
end

# Example: cache an OLS fit (stand-in for a slow calibration)
ols = (X, y) -> X \ y
ols_cached = memoize_to_disk(ols; label="ols_v1")

# First call computes and caches; second call loads from disk
θa = ols_cached([ones(3) [1.0, 2.0, 3.0]], [1.0, 3.0, 5.0])
θb = ols_cached([ones(3) [1.0, 2.0, 3.0]], [1.0, 3.0, 5.0])
@assert θa == θb
TipFinancial Modeling Pro Tip
  • For recurring production runs, use a directory convention like artifacts/YYYY-MM-DD/ with consistent filenames, and clean caches on a schedule to control disk use.

32.4 Model Validation

32.4.1 Static and dynamic validation

Static validation typically involves splitting the dataset into training and testing sets, where the testing set is held out and not used during model training. The model is trained on the training set and then evaluated on the held-out testing set to assess its performance. This approach helps to measure how well the model generalizes to unseen data.

The following example shows how to do a static validation in Julia.

using Random, Statistics, LinearAlgebra

# Synthetic time-indexed data
T = 200
x = rand(T)
y = 1.0 .+ 2.0 .* x .+ 0.1 .* randn(T)  # Vector response

# Chronological holdout (static validation)
cut = 150
Xtrain = hcat(ones(cut), x[1:cut])
ytrain = y[1:cut]
Xtest = hcat(ones(T - cut), x[(cut+1):end])
ytest = y[(cut+1):end]

# OLS fit
θ = Xtrain \ ytrain  # Vector length 2

# Predictions on the holdout
= Xtest * θ

# Metrics
mse = mean((ŷ .- ytest) .^ 2)
mae = mean(abs.(ŷ .- ytest))

println("Static validation (chronological holdout):")
println("Mean Squared Error (MSE): ", mse)
println("Mean Absolute Error (MAE): ", mae)
Static validation (chronological holdout):
Mean Squared Error (MSE): 0.008267067643781285
Mean Absolute Error (MAE): 0.0707896587375999

The following example shows how to do a dynamic validation in Julia.

using Random, Statistics, LinearAlgebra

# Reproducibility
Random.seed!(42)

# Simulate a simple linear data-generating process
T = 200
x = rand(T)
y = 1.0 .+ 2.0 .* x .+ 0.1 .* randn(T)  # y is a Vector (not an n×1 matrix)

# Walk-forward expanding-window validation: 1-step-ahead forecasts
initial_window = 60
sqerrs = Float64[]
abserrs = Float64[]

for t in (initial_window+1):T
    Xtr = hcat(ones(t - 1), x[1:(t-1)])
    ytr = y[1:(t-1)]

    θ = Xtr \ ytr  # OLS on past data only

    # 1-step-ahead prediction at time t
    ŷt = [1.0, x[t]]' * θ
    e = ŷt - y[t]

    push!(sqerrs, e^2)
    push!(abserrs, abs(e))
end

println("Dynamic validation (walk-forward expanding window):")
println("Mean Squared Error (MSE): ", mean(sqerrs))
println("Mean Absolute Error (MAE): ", mean(abserrs))
Dynamic validation (walk-forward expanding window):
Mean Squared Error (MSE): 0.012102241884186706
Mean Absolute Error (MAE): 0.08733803592034403
Note

Sometimes static and dynamic validation of a financial model can refer to the following analysis:

  • Static validation: whether the model reproduces time zero prices/balances from the model.
  • Dynamic validation: whether the model reproduces flows (e.g. cashflows, settlements) that are in-trend for historical data.

32.4.2 Implied rate analysis

Implied rates are rates that are derived from the prices of financial instruments, such as bonds or options. For example, in the context of bonds, the implied rate is the interest rate that equates the present value of future cash flows from the bond (coupons and principal) to its current market price.

using Zygote

# Define the bond cash flows and prices
cash_flows = [100, 100, 100, 100, 1000]  # Coupons and principal
prices = [950, 960, 1010, 1020, 1050]  # Market prices

# Define a function to calculate the present value of cash flows given a rate
function present_value(rate, cash_flows)
    pv = 0
    for (i, cf) in enumerate(cash_flows)
        pv += cf / (1 + rate)^i
    end
    return pv
end

# Define a function to calculate the implied rate using bisection method
function implied_rate(cash_flows, price)
    f(rate) = present_value(rate, cash_flows) - price
    return rootassign(f, 0.0, 1.0)
end
function rootassign(f, l, u)
    # Define an initial value
    x = 0.05
    # tolerance of difference in value
    tol = 1.0e-6
    # maximum number of iteration of the algorithm
    max_iter = 100
    iter = 0
    while abs(f(x)) > tol && iter < max_iter
        x -= f(x) / gradient(f, x)[1]
        iter += 1
    end
    if iter < max_iter && l < x < u
        return x
    else
        return -1.0
    end
end

# Calculate implied rates for each bond
implied_rates = [implied_rate(cash_flows, price) for price in prices]
# Print the results
for (i, rate) in enumerate(implied_rates)
    println("Implied rate for bond $i: $rate")
end
Implied rate for bond 1: 0.09658339166435045
Implied rate for bond 2: 0.09380219311021369
Implied rate for bond 3: 0.08046244727376842
Implied rate for bond 4: 0.0779014164014789
Implied rate for bond 5: 0.07041724037694008
Tip

JuliaActuary’s FinanceCore.jl provides a fast, robust irr function. More related utilities (e.g. present value) are found in ActuaryUtilities.jl.

32.5 Predictive vs. Explanatory Model Assessments

Model assessment should be driven by the model’s purpose. A predictive model is judged by how well it forecasts targets under realistic deployment conditions. An explanatory (or structural) model is judged by how well its parameters are identified, interpretable, and stable under interventions—so that counterfactuals are credible.

Predictive assessment - Define the forecast target and loss explicitly (point, quantile, probability, or full distribution). - Point forecasts (levels/returns): prioritize scale-aware losses such as RMSE and MAE. \[ \mathrm{RMSE} = \sqrt{\frac{1}{T}\sum_{t=1}^{T}(\hat{y}_t - y_t)^2}, \quad \mathrm{MAE} = \frac{1}{T}\sum_{t=1}^{T}|\hat{y}_t - y_t| \] Avoid MAPE when values can be near zero; consider symmetric MAPE variants if needed. - Quantile forecasts (e.g., VaR at level \(\tau\)): use pinball (quantile) loss. \[ L_\tau(\hat{q}_t, y_t) = \left(\tau - \mathbf{1}\{y_t < \hat{q}_t\}\right)(y_t - \hat{q}_t) \] - Probabilistic forecasts (default probabilities, loss distributions): use log score (negative log-likelihood), Brier score for binary events, CRPS for full distributions; evaluate calibration (reliability curves, PIT uniformity) and sharpness (narrowness of distributions). - Interval forecasts: evaluate coverage vs nominal and average interval width (or Winkler score). - Classification tasks (e.g., downgrade prediction): use AUROC/PR, calibration error, expected utility with cost-sensitive thresholds. - Validation design: for time series, use walk-forward (rolling-origin) evaluation with realistic feature availability; guard against leakage; include transaction costs/slippage where applicable.

Explanatory assessment - Parameter interpretability: signs, magnitudes, and units align with theory and domain knowledge; elasticities and risk premia within plausible ranges. - Identification: demonstrate that parameters are uniquely recoverable from the data/design (e.g., instrument relevance/exogeneity for IV; rank conditions for GMM); report overidentification tests where applicable (e.g., Hansen’s J). - Stability/invariance: test parameter constancy across regimes (structural break tests such as Chow; rolling/CUSUM diagnostics); assess sensitivity to alternative samples/specifications. - Counterfactual validity: show that the model’s structural relationships remain invariant under contemplated interventions (policy changes, shocks); evaluate out-of-sample counterfactual predictions when historical policy variation exists. - Moment fit for structural models: report distance between empirical and model-implied moments; assess which moments are well matched and which are not. - Sensitivity/uncertainty analysis: local and global parameter sensitivity (e.g., Sobol indices), posterior uncertainty where Bayesian, and scenario robustness for key assumptions.

Summary mapping

Goal Primary metrics/loss Validation design Finance examples
Predict point RMSE, MAE, MAPE (with caution) Walk-forward CV; leakage checks Forecast next-month returns, prepayment speeds
Predict quantile Pinball loss at τ Rolling quantile backtests VaR at 99%
Predict probability/distribution Log score, Brier, CRPS; calibration and sharpness Time-stamped splits; reliability/PIT checks Default probability, loss distribution for stress
Classification AUROC/PR, calibration error, expected utility Cost-sensitive thresholds; class imbalance handling Downgrade/watchlist prediction
Explanatory (structural) Identification tests, parameter plausibility, moment fit Stability/structural-break tests; sensitivity analysis Term-structure model, demand/supply elasticities
Counterfactual Invariance under intervention; policy simulation accuracy Natural experiments; out-of-sample policy periods Impact of capital requirement change on lending
TipFinance Modeling Pro-tip

Align the loss you optimize in estimation with the metric you report in evaluation. If your risk committee cares about 99% tail losses, train and evaluate on quantile/tail losses, not just RMSE.