“All models are wrong, but some are useful.” - George Box (1976)
32.1 Chapter Overview
A grab-bag of practical techniques for keeping models honest: sanity checks, serialization patterns for reproducibility, validation workflows, and ways to think about whether a model is doing what you think it’s doing.
32.2 General Modeling Techniques
32.2.1 Taking Things to the Extreme
Before trusting any model, ask: what happens at the edges? Set interest rates to zero, or negative. Assume 100% lapse. Perfectly correlated defaults. An illiquid market with zero trades. These extreme scenarios often reveal assumptions you didn’t know you had made.
Consider a simple loan loss model. It might work perfectly well under normal conditions, but what happens when recovery rates hit zero? Does the code handle that gracefully, or does it divide by something that’s now zero? Extreme thought experiments surface these hidden assumptions before production does.
32.2.2 Range Bounding
Sometimes you don’t need the answer—you just need to know that the answer is good enough. If both a pessimistic and an optimistic estimate clear your hurdle, you’re done.
Here’s a classic example from interview lore: you need to determine whether a mortgaged property’s value exceeds the $100,000 loan balance. No appraisal available. But you know that a comparable house in worse condition sold for $100 per square foot, and from the floor plan this house must be at least 1,000 square feet. So:
The property almost certainly exceeds the loan balance. No complex modeling required.
This technique is particularly useful in early scoping meetings or ad-hoc regulatory requests where a directional answer is all you need.
32.2.3 Pseudo-Monte Carlo Sanity Checks
Before committing to a massive simulation run, do a miniature version first. Fix the random seed, use a handful of scenarios, and verify that everything works end to end. This catches problems like:
Configuration files that aren’t being read correctly
Aggregation logic that breaks on edge cases
Performance bottlenecks that will be painful at scale
A ten-scenario dry run that takes five seconds can save you from discovering bugs halfway through an overnight batch job.
32.2.4 Model Validation
32.2.4.1 Static vs. Dynamic
Model validation is essential. The most common validation approach is static: split your data chronologically, fit on the earlier period, and test on the later period. This tells you how well the model generalizes to unseen data.
Dynamic validation (sometimes called walk-forward validation) is more demanding: at each time step, you only use data available up to that point. This mimics how the model would actually be used in production.
In some contexts, “static” and “dynamic” validation mean something different: static validation checks whether the model reproduces time-zero prices or balances, while dynamic validation checks whether projected cashflows match historical trends.
32.2.4.2 Implied Rates
Implied rates are a form of model inversion: given an observed price, what rate would produce that price? If your pricing function and your implied-rate function don’t round-trip consistently, something is wrong.
usingZygotefunctionpresent_value(rate, cash_flows)sum(cf / (1+ rate)^i for (i, cf) inenumerate(cash_flows))endfunctionimplied_rate(cash_flows, price)f(r) =present_value(r, cash_flows) - price# Newton's method using autodiff for the derivative x =0.05for _ in1:100 fx =f(x)abs(fx) <1e-6&&return x x -= fx /gradient(f, x)[1]endreturnNaN# didn't convergeendcash_flows = [100, 100, 100, 100, 1100]prices = [950, 1000, 1050]for price in prices r =implied_rate(cash_flows, price)println("Price $price → rate $(round(r*100, digits=2))%")end
JuliaActuary’s FinanceCore.jl provides a robust irr function that handles edge cases better than a hand-rolled Newton’s method.
32.2.5 Predictive vs. Explanatory Models
Models serve different masters. A predictive model needs to forecast accurately; an explanatory model needs to tell a coherent story about why things happen. The validation approach should match the purpose.
For prediction, pick a loss function that matches how the forecast will be used. If you’re forecasting claims payments, RMSE or MAE make sense. If you’re estimating Value-at-Risk, use a quantile loss that rewards accurate tail placement. If you’re producing full distributions, consider the Brier score or CRPS.
For explanation, the bar is different. Coefficients should have sensible signs and magnitudes—a lapse elasticity of –0.3 per 100 bps rate change is something you can discuss with product actuaries. The model should be stable across different time periods, and it should remain plausible under counterfactual scenarios (“what if we changed surrender charges?”).
TipFinancial Modeling Pro Tip
Align the loss you optimize with the metric you report. If the risk committee cares about 99th percentile losses, train and evaluate on quantile losses—not just RMSE.
32.2.6 Causal Modeling
Causal modeling addresses an important distinction: most financial models capture correlation, not causation. That’s often fine for prediction, but dangerous for “what-if” analysis. If you want to know what happens when you change something, you need causal reasoning.
Judea Pearl’s work on directed acyclic graphs (DAGs) provides a framework for this. The basic idea: draw arrows between variables to represent direct causal influence, then use the graph to determine what you need to control for (and what you shouldn’t).
A few patterns come up repeatedly:
Confounders drive both the treatment and the outcome. Macro growth affects both lending standards and default rates. If you don’t account for it, you’ll see a spurious relationship between standards and defaults.
Mediators sit on the causal pathway. A capital rule affects lending supply, which affects loan growth. If you control for lending supply, you block part of the effect you’re trying to measure.
Colliders are caused by two other variables. Regulation intensity and market stress both affect media coverage. If you control for media coverage, you create a spurious correlation between regulation and stress.
This matters because financial regulators and boards increasingly ask “what happens if we do X?” Answering that question requires thinking carefully about causal structure, not just fitting the best predictive model. See Pearl (2009) for more on this topic.
32.2.7 Other Techniques Worth Knowing
A few topics we won’t cover in depth but are worth exploring:
Quasi-Monte Carlo uses low-discrepancy sequences (Sobol, Halton) instead of pseudo-random numbers. For high-dimensional integrals like exotic option pricing or nested ALM, this can dramatically reduce variance.
Variance reduction techniques—control variates, antithetic paths, stratification—shrink simulation error without adding more scenarios. Useful when estimating Greeks or tail percentiles.
Scenario reduction algorithms compress thousands of economic scenarios into a representative subset while preserving risk metrics. Kantorovich distance pruning is one approach.
Reverse stress testing inverts the usual question: instead of “what’s the loss under scenario X?”, ask “what scenario produces loss Y?” This can surface vulnerabilities that standard stress grids miss.
32.3 Programming Techniques
32.3.1 Serialization
Serialization is important because in most finance workflows, the slow part isn’t the regression—it’s the data prep, calibration, and scenario generation that come before. If you’re running the same expensive calibration every time you tweak something downstream, you’re wasting compute and making audits harder.
Serialization lets you checkpoint expensive intermediate results. The question is which format to use:
Format
Good for
Watch out for
Serialization stdlib
Quick caches, memoization
Breaks across Julia versions
JLD2
Persisting results across sessions
Still Julia-specific
Arrow/Parquet
Large tables, cross-language sharing
Not for arbitrary Julia types
CSV/JSON/TOML
Configs, small tables, human-readable
Slow, lossy for binary data
Here’s a pattern for saving model state with atomic writes (so you don’t end up with half-written files if something crashes):
usingDates, Serializationstruct ModelState θ::Vector{Float64} # fitted parameters (example) seed::Int64 # RNG seed used for the run timestamp::DateTime # when the snapshot was created note::String # short descriptionend# Atomic write to avoid half-written filesfunctionatomic_serialize(path::AbstractString, obj) dir =dirname(path)mkpath(dir) tmp =tempname(dir)serialize(tmp, obj)mv(tmp, path; force=true)return pathend# Example: save/load a stateθ = [1.0, 2.0] # pretend these were estimatedstate =ModelState(θ, 42, now(), "OLS on 2025-08-11")path =joinpath("artifacts", "model_state.jls")atomic_serialize(path, state)restored =deserialize(path)
ModelState([1.0, 2.0], 42, DateTime("2026-02-09T18:57:26.662"), "OLS on 2025-08-11")
For cross-session persistence where you might share artifacts with colleagues, JLD2 is more robust.
usingJLD2, Random, LinearAlgebra, DatesX =hcat(ones(100), rand(100))y = X * [1.0, 2.0] .+0.1.*randn(100)θ = X \ ymeta = ( julia_version=string(VERSION), created_at=string(now()), description="OLS fit example",)mkpath("artifacts")jldsave("artifacts/example.jld2"; θ, meta)θ_loaded, meta_loaded = JLD2.load("artifacts/example.jld2", "θ", "meta")
The key insight: serialize fitted parameters, calibrated curves, and expensive intermediate results. Don’t serialize raw data—keep that in efficient columnar formats and reference it by path (and ideally by content hash) in your artifact metadata. And remember that the CPU is fast enough that in many cases it’s faster to compute an answer than it is to retrieve it from memory.
32.3.2 Memoization
Memoization is caching function results keyed by their inputs. For expensive computations that get called repeatedly with the same arguments, this can be a huge win.
For recurring production runs, use a directory convention like artifacts/YYYY-MM-DD/ and clean old caches on a schedule. Otherwise disk usage creeps up over time.
32.3.3 Automated Benchmarks
If you have a pricing engine or cash-flow projection that runs nightly, maintain a small set of benchmark portfolios with known expected outputs. Run them automatically and alert if results drift. This catches numerical regressions before they reach production—and gives you confidence when refactoring.
Pearl, Judea. 2009. Causality: Models, Reasoning, and Inference. 2nd ed. New York: Cambridge University Press.