7 Data and Types

Alec Loudenback

I am only one, but I am one. I can’t do everything, but I can do something. The something I ought to do, I can do. And by the grace of God, I will - Edward Everett Hale (1902)

7.1 Chapter Overview

The powerful benefits that using assigning types to data has within the model’s system, some examples of utilizing types to simplify a programs logic, and comparing aspects of different type related program organization (such as object oriented design versus composition).

7.2 Using Types to Value a Portfolio

We will assemble the tools and terminology to value a portfolio of assets by leverage types (@sec-data-types). Using the constructs introduced in the prior chapter, we can describe the portfolio valuation as additively reducing the mapped value of assets in the portfolio. If value is our valuation function), we are trying to do the following:

mapreduce(value,+,portfolio)

The challenge is how do design an all-purpose value function? In portfolio, the assets may be heterogeneous, so we will need to define what the valuation semantics are for the different kinds of assets. To get to our end goal, we will need to:

Define the different kinds of assets within our portfolio
How the assets are to be valued.

We will accomplish this by utilizing data types.

7.3 Benefits of Using Types

As a preview of why we want to utilize types in our program, there are a number of benefits:

Separate concerns. For example, deciding how to value an option need not know how we value a bond. The code and associated logic is kept distinct which is easier to reason about and to test.
Re-use code. When a set of types within a hierarchy all share the same logic, then we can define the method at the highest relevant level and avoid writing the method for each possible type. In our simple example we won’t get as much benefit here since the hierarchy is simple and the set of types small.
Dispatch on type. By defining types for our assets, we can use multiple dispatch to define specialized behavior for each type. This allows us to write generic code that works with any asset type, and the Julia compiler will automatically select the appropriate method based on the type of the asset at runtime. This is a powerful feature that enables extensibility and modularity in our code.
Improve readability and clarity. By defining types for our assets, we make our code more expressive and self-documenting. The types provide a clear indication of what kind of data we are working with, making it easier for other developers (or ourselves in the future) to understand and maintain the codebase.
Enable type safety. By specifying the expected types for function arguments and return values, we can catch type-related errors at compile time rather than at runtime. This helps prevent bugs and makes our code more robust.

With these benefits in mind, let’s start by defining the types for our assets. We’ll create an abstract type called Asset that will serve as the parent type for all our asset types. If you haven’t read it already, Section 5.4.7 is a good reference for details on types at the language level (this section is focused on organization and building up the abstracted valuation process).

7.4 Defining Types for Portfolio Valuation

We will define five types of assets in this simplified universe:

Cash
Risk Free Bonds (coupon and zero-coupon varieties)

To do the valuation of these, we need some economic parameters as well: risk free rates for discounting.

Here’s the outline of what follows to get an understanding of types, type hierarchy, and multiple dispatch.

Define the Cash and Bond types.
Define the most basic economic parameter set.
Define the value functions for Cash and Bonds.

## Data type definitions
1abstract type AbstractAsset end

3struct Cash <: AbstractAsset
    balance::Float64
end

2abstract type AbstractBond <: AbstractAsset end

struct CouponBond <: AbstractBond
    par::Float64
    coupon::Float64
    tenor::Int
end

struct ZeroCouponBond <: AbstractBond
    par::Float64
    tenor::Int
end

1: General convention is to name abstract types beginning with Abstract...
2: There can exist an abstract type which is a subtype of another abstract type.
3: We define concrete data types (structs) with the fields necessary for valuing those assets.

Now to define the economic parameters:

struct EconomicAssumptions{T}
  riskfree::T
end

This is a parametric type because later on we will vary what objects we use for riskfree. For now, we will use simple scalar values, like in this potential scenario:

econ_baseline = EconomicAssumptions(0.05)

EconomicAssumptions{Float64}(0.05)

Now on to defining the valuation for Cash and AbstractBonds. Cash is always equal to it’s balance:


value(asset::Cash, ea::EconomicAssumptions) = asset.balance

value (generic function with 1 method)

Risk free bonds are the discounted present value of the riskless cashflows. We first define a method that generically operates on any fixed bond, all that’s left to do is for different types of bonds to define how much cashflow occurs at the given point in time by defining cashflow for the associated type.

2function value(asset::AbstractBond, r::Float64)
    discount_factor = 1.0
    value = 0.0
    for t in 1:asset.tenor
1        discount_factor /= (1 + r)
        value += discount_factor * cashflow(asset, t)
    end
    return value
end

function cashflow(bond::CouponBond, time)
    if time == bond.tenor
        (1 + bond.coupon) * bond.par
    else
        bond.coupon * bond.par
    end
end

3function value(bond::ZeroCouponBond, r::Float64)
    return bond.par / (1 + r)^bond.tenor
end

1: x /= y, x += y, etc. are shorthand ways to write x = x / y or x = x + y
2: value is defined for AbstractBonds in general…
3: … and then more specifically for ZeroCouponBonds. This will be explained when discussing “dispatch” below.

value (generic function with 3 methods)

7.4.1 Dispatch

When a function is called, the computer has to decide which method to use. In the example above, when we want to value a ZeroCouponBond, does the value(asset::AbstractBond, r) or value(bond::ZeroCouponBond, r) version get used?

Dispatch is the process of determining the right method to use and the rule is that the most specific defined method gets used. In this case, that means that even though our ZeroCouponBond is an AbstractBond, the routine that will used is the most specific value(bond::ZeroCouponBond, r).

Already, this is a powerful tool to simplify our code. Imagine the alternative of a long chain of conditional statements trying to find the right logic to use:

# don't do this!
function value(asset,r)
    if asset.type == "ZeroCouponBond"
        # special code for Zero coupon bonds
        # ...
    elseif asset.type == "ParBond"
        # special code for Par bonds
        # ...
    elseif asset.type == "AmortizingBond"
        # special code for Amortizing Bonds
        # ...
    else
        # here define the generic AbstractBond logic
    end
end

With dispatch, the compiler does this lookup for us, and more efficiently than enumerating a list of possible codepaths.

In our “don’t do this” definition of value above, we used a simple scalar interest rate to determine the rate to discount the cash flows. Note how in the definition of value for ZeroCouponBond, we have defined a more specific signature: both the first and second arguments are specific, concrete types. When we call value(ZeroCouponBond(100.0,3),0.05), we avoid the loop that’s defined in the generic case and jump immediate to a more efficient definition of its value. This is dispatching on the combination of types and picking the most relevant (specific) version for what has been passed to it.

Despite the definitions above, the following will error because we haven’t defined a method for value which takes as it’s second argument a type of EconomicAssumptions:

#| error: true
value(ZeroCouponBond(100.0,5),econ_baseline)

Let’s fix that by defining a method which takes the economic assumption type and just relays the relevant risk free rate to the value methods already defined (which take an AbstractBond and a scalar r).

value(bond::AbstractBond,econ::EconomicAssumptions) = value(bond,econ.riskfree)

value (generic function with 4 methods)

Now this following works:

value(ZeroCouponBond(100.0, 5), econ_baseline)

78.35261664684589

Here’s an example of how this would be used:

portfolio = [
    Cash(50.0),
    CouponBond(100.0, 0.05, 5),
    ZeroCouponBond(100.0, 5),
]

map(asset-> value(asset,econ_baseline), portfolio)

3-element Vector{Float64}:
 50.0
 99.99999999999999
 78.35261664684589

This is very close to the goal that we set out at the end of the section. We can complete it by reducing over the collection to sum up the value:

mapreduce(asset -> value(asset,econ_baseline), +, portfolio)

228.3526166468459

Note

This code:

mapreduce(asset-> value(asset,econ_baseline), +, portfolio)

is more verbose than what we set out do at the start (mapreduce(value,+,portfolio)) due to the two-argument value function requiring a second argument for the economic variables. This works well! However, there is a way to define it which avoids the anonymous function, which in some cases will end up needing to be compiled more frequently than you want it to. Sometime we want a lightweight, okay-to-compile-on-the-fly function. Other times, we know it’s something that will be passed around in compute-intensive parts of the code. A technique in this situation is to define an object which “locks in” one of the arguments but behaves like the anonymous version. There is a pair of types in the Base module, Fix1 and Fix2, which represent partially-applied versions of the two-argument function f, with the first or second argument fixed to the value “x”.

This is, Base.Fix1(f, x) behaves like y->f(x, y) and Base.Fix2(f, x) behaves like y->f(y, x).

In the context of our valuation model, this would look like:

val = Base.Fix2(value,econ_baseline)
mapreduce(val,+,portfolio)

228.3526166468459

7.4.1.1 Multiple Dispatch

A more general concept is that of multiple dispatch, where the types of all arguments are used to determine which method to use. This is a very general paradigm, and in many ways is more extensible than traditional object oriented approaches, (more on that in Section 7.5). What if instead of a scalar interest rate value we wanted to instead pass an object that represented a term structure of interest rates?

Extending the example, we can use a time-varying risk free rate instead of a constant. For fun, let’s say that the risk free rate has a sinusoidal pattern:

econ_sin = EconomicAssumptions(t -> 0.05 + sin(t) / 100)

EconomicAssumptions{var"#5#6"}(var"#5#6"())

Now value will not work, because we’ve only defined how value works on bonds if the given rate is a Float64 type:

#| error: true
value(ZeroCouponBond(100.0, 5), econ_sin)

We can extend our methods to account for this:

1function value(bond::ZeroCouponBond, r::T) where {T<:Function}
2    return bond.par / (1 + r(bond.tenor))^bond.tenor
end

1: The r::T ... where {T<:Function} says use this method if r is any concrete subtype of the (abstract) Function type.
2: r is a function, where we call the time to get the zero coupon bond (a.k.a. spot) rate for the given timepoint.

value (generic function with 5 methods)

Now it works:

value(ZeroCouponBond(100.0, 5), econ_sin)

82.03058910862806

The important thing to note here is that the compiler is using the most specific method of the function (value(bond::ZeroCouponBond, r::T) where {T<:Function}). Both the types of the arguments are influencing the decision of which method to use. We could go on to define the appropriate method for CouponBond to complete the example.

7.5 Objected-Oriented Design

Object oriented (OO) type systems use the analogy that various parts of the system are their own objects which encapsulate both data and behavior. Object oriented design is often one the first computer programming abstractions introduced because it very relatable¹, however there are a number of its flaws in over-relying on OO patterns. Julia does not natively have traditional OO classes and types, but much of OO design can be emulated in Julia except for data inheritance.

We bring up object oriented design for comparison’s sake, but think that ultimately choosing a data driven or functional believe is better for financial modeling. Of course, many robust, well used financial models have been built this way but in our experience the abstractions become unnatural and maintenance unwieldy beyond simple examples. We’ll now discuss some of the aspects of OO design and why the overuse of OO is not preferred.

Note

For readers without background in OO programming, the main features of OO languages are:

Hierarchical type structures, which include concrete and abstract (often called classes instead of types).
Sub-classes inherit both behavior and data (in Julia, subtypes only inherit behavior, not data).
Functions that depend on the type of the object need to be ascribed to a single class and then can dispatch more specifically on the given argument’s type.

7.6 Assigning Behavior

Needing to assign methods to a single class can lead to awkward design limitations - when multiple objects are involved in a computation, why dictate that only one of them “controls” the logic?

The value function is a good example of this. If we had to assign value to one of the objects involved, should it be the economic parameters of the asset contracts? The choice is not obvious at all. Isn’t it the market (economic parameters) that determines the value? But then if value were to be a method wholly owned by the economic parameters, how could it possible define in advance the valuation semantics of all types of assets? What if one wanted to extend the valuation to a new asset class? Downstream users or developers would need to modify the economic types to handle new assets they wanted to value. However, because the economic types were owned by an upstream package, they can’t be extended this way.

This is an issue with traditional OO designs and that resolves itself so elegantly with multiple dispatch.

7.7 Inheritance

We discussed the type hierarchy in Chapter 5 and in most OO implementations this hierarchy comes with inheriting both data and behavior. This is different from Julia where subtypes inherit behavior but not data from the parent type.

Inheriting the data tends to introduce a tight coupling between the parent and the child classes in OO systems. This tight coupling can lead to several issues, particularly as systems grow in complexity. For example, changes in the parent class can inadvertently affect the behavior of all its child classes, which can be problematic if these changes are not carefully managed. This is often referred to as the “fragile base class problem,” where base classes are delicate and changes to them can have widespread, unintended consequences.

Another issue with inheritance in OO design is the temptation to use it for code reuse, which can lead to inappropriate hierarchies. Developers might create deep inheritance structures just to reuse code, leading to a scenario where classes are not logically related but are forced into a hierarchy. This can make the system harder to understand and maintain.

Moreover, inheritance can sometimes lead to the duplication of code across the hierarchy, especially if the inherited behavior needs to be slightly modified in different child classes. This goes against the DRY (Don’t Repeat Yourself) principle, which is a fundamental concept in software engineering advocating for the reduction of repetition in code.

7.7.1 Composition over Inheritance

To mitigate some of the problems associated with inheritance, there’s a growing preference for composition. Composition involves creating objects that contain instances of other objects to achieve complex behaviors. This approach is more flexible than inheritance as it allows for the creation of more modular and reusable code. There is a general preference for “composition over inheritance” among professional developers these days.

In composition, objects are constructed from other objects, and behaviors are delegated to these contained objects. This approach allows for greater flexibility, as it’s easier to change the behavior of a system by replacing parts of it without affecting the entire hierarchy, as is often the case with inheritance.

Composition looks like this:

struct CUSIP
    code::string
end

struct FixedBond
    coupon::Float64
    tenor::Float64
end

struct FloatingBond
    spread::Float64
    tenor::Float64
end

struct MunicipalBond
    cusip::CUSIP
    fi::FixedBond
end

struct Swap
    float_leg::FloatingBond
    fixed_leg::FixedBond
end

struct ListedOption
    cusip::CUSIP
    #... other data fields
end

struct UnlistedBond
    fi::FixedIncome
end



# define behavior which relies on delegation to components 
last_transaction(c::CUSIP) = # ...perform lookup of data
last_transaction(asset) = last_transaction(asset.cusip)

duration(f::FixedIncome) = # ... calculate duration
duration(asset) = duration(asset.fi)

In the above example, there are number of asset classes that have CUSIP related attributes (i.e. the 9 character code) and behavior (e.g. being able to look up transaction data). Other assets have fixed income attributes (e.g. calculating a duration). There’s no clear hierarchy here.

Composition lets us bundle the data and behavior together without needing complex chains of inheritance.

Note

A CUSIP (Committee on Uniform Security Identification Procedures) number, is a unique nine-character alphanumeric code assigned to securities, such as stocks and bonds, in the United States and Canada. This code is used to facilitate the clearing and settlement process of securities and to uniquely identify them in transactions and records.

“Many people who have no idea how a computer works find the idea of object-oriented programming quite natural. In contrast, many people who have experience with computers initially think there is something strange about object oriented systems.” - David Robson, “Object Oriented Software Systems” in Byte Magazine (1981).↩︎