Nested Fixed Point Algorithm

The starting point for estimating dynamic discrete choice models is John Rust’s nested fixed point algorithm. This page provides a programmatic overview of the linear model in his paper “Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher,” although it can be adapted to other specifications easily enough. This page assumes that you are familiar with the technical aspects of the estimation already, and only discusses the programmatic aspects. The estimation technique has been simplified in a number of ways to ease exposition. First, I only use the contraction mapping to find the fixed point in the dynamic program (rather than Newton’s Method). Second, I use an optimization package rather than program the BHHH algorithm myself. If this means nothing to you it does not matter. The rest of the material is self contained, it just doesn’t follow Rust’s paper step by step. My primary source for this program is Rust’s original paper along with its online documentation. I also found Holger Sieg’s slides_io_ddc helpful.

The optimization package that I use is Optim. This is the only dependency that you will need to download. You can download it by removing the comment and running “Pkg.add()”. If it is already downloaded you can just run this next line.

using Optim
using ExcelReaders

It helps to start off with organized code. In Julia, you have the option to create user defined types which provide a convenient way to store variables and keep your code general. For instance, if you

Foresight helps when determining what should be in your user-defined type. Because of additive separability and conditional independence, we know that Rust’s model is completely characterized by the expected value function

$EV(o,d) = \sum_{y \in S} \log \left[ \sum_{j \in \mathbb{D}} \exp \left\lbrace u(j, y; \theta) + \beta EV(j, y) \right\rbrace \right] p(y\ |\ o,d)$

The unknown components in this equation are $\beta$ , $\theta$ (which will determined the value of $u$ ), $EV$ , and $p(y\ |\ o, d)$ . Remember that the nested fixed point algorithm depends on a discrete state space. Rust takes the continuous value of “accumulated mileage” and assigns each value to a bin, e.g. $[0,\ 5,000] = 1$ , $(5,000,\ 10,000] = 2$ , etc. Over a discrete state space, any function can be represented by a vector or a matrix. For instance, assume we want to represent the function $f(x) = x^2$ for the values $x = \{0,1,2,3,4,5\}$ . We could either program function that takes a single input $x$ and outputs $x^2$ or we could define the vector $f_x = \{0, 1, 4, 9, 16, 25\}$ , where each value of this vector corresponds to $x^2$ of its index position. It might not not seem helpful to define a function in the second way, however it is far more convenient when you don’t know the functional form of the function you are representing. Naturally $EV(o,d)$ won’t have a functional form, but it will still be representable by a vector.

With these considerations in mind, I define the type RustModel and have it store values for $\beta$ , $\theta$ , $EV$ , and $p(y\ |\ o, d)$ .

type RustModel
    beta::Float64 # Discount Factor
    params::Array{Float64} # Utility Function Parameters
    pi::Array{Float64} # Transition Probabilities
    EV::Array{Float64} # Expected Value Array
    K::Int32 # Size of State Space

   # Initialize Model Parameters
    function RustModel( beta = .9999, K = 90 )
       EV = ones(K, 1)
       pi = [.348, .639, .013]
       params = [3.6, 10]
       return new( beta, params, pi, EV, K )
    end
end;

m = RustModel()

println("Beta is ", m.beta)
println("Guesses for theta are ", m.params)
println("Transition probability guesses are ", m.pi)
println("Size of State Space: ", m.K)

Beta is 0.9999
Guesses for theta are [3.6, 10.0]
Transition probability guesses are [0.348, 0.639, 0.013]
Size of State Space: 90

I similarly define a type for the data. The two variables will be the endogenous decision variable and the exogenous state variable

type RustData
    endog::Array{Int32}
    exog::Array{Int32}

    function RustData( endog = [], exog = [])
        return new( endog, exog )
    end
end

Per Period Returns Function

The return function is given by the following discrete linear form

$\displaystyle u(0, o_t) = -\theta_{11} o_t$

and

$u(1, o_t) = - RC$

where $RC$ is the replacement cost of an engine, and $o_t$ is the current state variable. $d_t$ is the decision at time $t$. Again, the utility function is translated into a $2$ by $K$ matrix, where the first row is the utility from $d = 0$ in each state and the second row is the utility from $d = 1$ in each state. Since utility is equal to $-RC$ when $d = 1$ the second row is a constant vector.

The function $u$ takes a RustModel variable as the input and returns a $2$ by $K$ matrix (where $K$ is the size of the state space). The parameters that are used from the RustModel are $K$ and $\theta$ (called params). The first entry in params is $\theta_1$ and the second entry is $RC$ .

function u(model::RustModel)
    S = collect(1:model.K)' # Generate State Space range, i.e. [1, 2, 3, 4, ...]
    d_0 = .001 * model.params[1] * S # Utility from not replacing
    d_1 = model.params[2] * ones(1, model.K) # Utility from replacing
    U = vcat(d_0, d_1) # Utility matrix
    return -U
end

u(m)

2×90 Array{Float64,2}:
  -0.0036   -0.0072   -0.0108   -0.0144  …   -0.3168   -0.3204   -0.324
 -10.0     -10.0     -10.0     -10.0        -10.0     -10.0     -10.0

Transition Probabilities

The transition matrix is assumed to have a simple form. Given any state today, the probability of staying in that state is $\pi_0$ , the probability of moving forward one state is $\pi_1$ and the probability of moving forward two states is $\pi_2 = 1-\pi_0-\pi_1$ . Because the state space is discrete, there is a maximum state that can be attained. This state, by necessity, becomes an absorbing state, i.e. if we ever arrive at this state we will be stuck in it with probability $1$ . The transition matrix becomes

$P = \left\lbrace\begin{matrix}\pi_0 & \pi_1 & \pi_2 & 0 & 0 & \cdots & 0 & 0\\0 & \pi_0 & \pi_1 & \pi_2 & 0 & \cdots & 0 & 0\\\vdots & & & & \ddots & & & \vdots\\0 & 0 & 0 & 0 & 0 & \cdots & \pi_0 & 1 - \pi_0 \\0 & 0 & 0 & 0 & 0 & \cdots & 0 & 1 \\\end{matrix}\right\rbrace$

Note that we can (and do) estimate $\pi_0$ , $\pi_1$ , and $\pi_2$ via maximum likelihood. Naturally, the maximum likelihood estimator is $\hat{\pi}_0 = \frac{1}{n} \sum_{i = 1}^n I\{\Delta x = 0\}$ etc. These are the probabilities that will be used when estimating the model. The following program creates the transition matrix from the probabilities given in the RustModel. Note that the number of probabilities to be estimated depends on the discretization of the state space. More bins means more probabilities to be estimated. Also, a computational sidenote, the majority of entries in this matrix are $0$ , so a sparse matrix leads to efficieny gains in this instance. The sparse matrix is not implemented until the contraction mapping function below for the sake of illustrative purposes.

The function transition_probs takes a RustModel variable and returns a $K$ by $K$ transition matrix. The variables used from the RustModel variable are $\pi$ and $K$ .

function transition_probs(model::RustModel)
    t = length(model.pi)
    ttmp = zeros(model.K - t, model.K) # Transition Probabilities
    for i in 1:model.K - t
        for j in 0:t-1
            ttmp[i, i + j] = m.pi[j+1]
        end
    end
    atmp = zeros(t,t) # Absorbing State Probabilities
    for i in 0:t - 1
        atmp[i+ 1,:] = [zeros(1,i) m.pi[1:t - i - 1]' ( 1 - sum(m.pi[1:t- i - 1]) ) ]
    end
    return [ttmp ; zeros(t, model.K - t) atmp]
end;

transition_probs( m )

90×90 Array{Float64,2}:
 0.348  0.639  0.013  0.0    0.0    …  0.0    0.0    0.0    0.0    0.0  
 0.0    0.348  0.639  0.013  0.0       0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.348  0.639  0.013     0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.348  0.639     0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.348     0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0    …  0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0    …  0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0  
 ⋮                                  ⋱  ⋮                                
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0    …  0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.013  0.0    0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.639  0.013  0.0    0.0    0.0  
 0.0    0.0    0.0    0.0    0.0    …  0.348  0.639  0.013  0.0    0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.348  0.639  0.013  0.0  
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.348  0.639  0.013
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.348  0.652
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    1.0

Social Surplus Function

Remember that the expected value function takes the form

$EV(o,d) = \sum_{y \in S} G(y) p(y|o,d )$

where $G(y)$ is the social surplus function. I show in the notes that the social surplus function has the form

$G(o_t) = \log \left\lbrace \sum_{j \in \mathbb{D}} \exp \left[u(j, o_t) + \beta EV(j, o_t) \right] \right\rbrace$

Note that $EV(o,1) = EV(0,0)$ . So the expected value of replacing a bus engine is the the same as not replacing a bus engine that has zero total mileage. The $EV$ function is therefore a function of one variable, $o$ , not two. We can therefore treat it as a $1$ by $K$ vector.

The program ss takes a RustModel variable and returns a $1$ by $K$ vector. The variables from the RustModel that are used are $\beta$ and the $EV$ vector. The function also uses the previously defined function $u$ . Note that there are issues of overflow here. Because $\beta$ is chosen to be large, we generate large negative values for $EV$ . Because a computer has limited precision, this can result in undefined results. For instance, $e^{-1700}$ will result in $0$ for many calculators (and Google). Therefore $\ln e^{-1,700}$ results in -Inf, instead of $-1,700$ and the estimation will stop. Because the social surplus function takes logs of the final result, we can subtract the current value of $EV$ before exponentiation and then add it back on after taking logs. This will be numerically identical to the correct number and fixes the overflow issue, e.g. $-1,700 + \ln e^{-1,700 + 1,700} = -1,700$ .

function ss(model::RustModel)
    ss_val = (  exp( u( model )[1,:] + model.beta * model.EV - model.EV) +
                exp( u( model )[2,:] + model.beta * model.EV[1] - model.EV) )
    return model.EV + log( ss_val )
end;

Contraction Mapping

With the social surplus vector (i.e. the vector of the functions values for each state) and the transition probabilities we can construct the expected value vector

$EV = P \cdot G(EV)$

The contraction mapping uses an initial guess for $EV_0$ and calculates $EV_1 = P \cdot G(EV)$ . It continues this iteration until the maximum difference between $EV_n$ and $EV_{n-1}$ is sufficiently small (you can choose whatever tolerance you would like, although the smaller the tolerance, the longer it takes to converge).

This function takes a RustModel variable and doesn’t return anything. It merely updates the value of $EV$ for the RustModel it was provided with the fixed point it calculates.

function contraction_mapping( model::RustModel )
    P = sparse( transition_probs( model ) ) # Transition Matrix (K x K)
    eps = 1 # Set epsilon to something greater than 0
    while eps > .000001
        EV1 = P * ss( model )
        eps = maximum(abs(EV1 - model.EV))
        model.EV = EV1
    end
end;

m.EV = ones(1, m.K)'
contraction_mapping( m )
println(m.EV)

[-1718.29; -1718.54; -1718.78; -1719.02; -1719.25; -1719.48; -1719.71; -1719.92; -1720.14; -1720.34; -1720.54; -1720.74; -1720.93; -1721.12; -1721.3; -1721.47; -1721.65; -1721.81; -1721.97; -1722.13; -1722.28; -1722.42; -1722.57; -1722.7; -1722.84; -1722.96; -1723.09; -1723.21; -1723.32; -1723.43; -1723.54; -1723.64; -1723.74; -1723.84; -1723.93; -1724.02; -1724.11; -1724.19; -1724.27; -1724.35; -1724.42; -1724.49; -1724.56; -1724.63; -1724.69; -1724.76; -1724.82; -1724.87; -1724.93; -1724.98; -1725.04; -1725.09; -1725.14; -1725.18; -1725.23; -1725.27; -1725.32; -1725.36; -1725.4; -1725.44; -1725.47; -1725.51; -1725.55; -1725.58; -1725.62; -1725.65; -1725.68; -1725.71; -1725.74; -1725.77; -1725.8; -1725.83; -1725.85; -1725.88; -1725.91; -1725.93; -1725.95; -1725.98; -1726.0; -1726.02; -1726.04; -1726.06; -1726.08; -1726.1; -1726.11; -1726.13; -1726.14; -1726.15; -1726.15; -1726.15]

Choice Probabilities

Choice probabilities can be derived from the social surplus function. The expression for these probabilities is

$P(d|o) = \frac{e^{\tilde{v}(o,d)}}{\sum_j e^{ \tilde{v}(o,j)}}$

where $\tilde{v}(o,d) = u(o,d) + \beta EV(o, d)$ .

The following function returns a choice probability vector of the form

$P_k = [\ P(0\ |\ 0),\ P(0\ |\ 1),\ P(0\ |\ 2), ...,\ P(0\ |\ K)\ ]$

Again, overflow is an issue, so the maximum value of $EV$ is subtracted from the exponents in the numerator and denominator. This is equivalent to multiplying through by $1$ and resolves the overflow.

function choice_p( model::RustModel )
    max_EV = maximum( model.EV )
    P_k = exp( u( model )[1,:] + model.beta *  model.EV - max_EV ) ./
    ( exp( u( model )[1, :] + model.beta *  model.EV - max_EV ) + exp( u( model )[2,:] + model.beta *  model.EV[1] - max_EV ) )
    return P_k
end;

Partial Log-Likelihood

The likelihood function, given the choice probabilities and the transition probabilities, is given by

$L(\theta\ |\ x_1, x_2, ..., x_T, d_1, d_2, ..., d_T ) = \prod_{t=2}^{T} P(d_t\ |\ x_t, \theta )p(x_t\ |\ x_{t-1}, d_{t-1}, \theta)$

A simpler formulation, which still provides consistent estimates, is the partial log-likelihood function

$PL(\theta\ |\ x_1, x_2, ..., x_T, d_1, d_2, ..., d_T ) = \prod_{t=2}^{T} P(d_t\ |\ x_t, \theta )$

First we need to limit the data to the relevant probabilities given the current state. After taking the log of these probabilities, we then need to select the log-probability of the agent’s choice and sum over these. This will be the log-likelihood.

function partialLL( model::RustModel, data::RustData)
    decision_obs = data.endog
    state_obs = data.exog
    cp_tmp = choice_p( model )
    relevant_probs = [ cp_tmp[convert(Int, i)] for i in state_obs ]
    pll = [ if decision == 0 log(r_p) else log(1 - r_p) end for (decision, r_p) in zip(decision_obs, relevant_probs)]
    return -sum(pll)
end

function ll( model::RustModel, data::RustData )

    function objFunc( params )
        model.params = params
        contraction_mapping( model )
        pll = partialLL( model, data )
        return pll
    end

    params0 = [.01, 4]
    optimum = optimize( objFunc, params0 )
    return optimum
end

John Rust’s Bus Engine Replacement Data

This section just runs code to clean Rust’s original datasets. It can largely be ignored. Just note that the omax and n variables below need to match your choice of K in the RustModel type. The data has been compiled from files that can be found on the companion website to Aguirregabiria and Mira’s paper “Dynamic discrete choice structural models: A survey” (see here).

using DataFrames
d309 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "d309", header = false);
g870 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "g870", header = false);
rt50 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "rt50", header = false);
t8h203 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "t8h203", header = false);
a452372 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "a452372", header = false);
a452374 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "a452374", header = false);
a530872 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "a530872", header = false);
a530874 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "a530874", header = false);
a530875 = readxlsheet(DataFrame, "[path]/Datasets/NFXP.xlsx", "a530875", header = false);

Each column of the matrix corresponds to an individual bus. The first 11 rows correspond to header information. Respectively,

1. Bus Number
2. Month Purchased
3. Year Purchased
4. Month of 1st Engine Replacement
5. Year of  1st Engine Replacement
6. Odometer at Replacement
7. Month of 2nd Engine Replacement
8. Year of  2nd Engine Replacement
9. Odometer at Replacement
10. Month Odometer Data Begins
11. Year Odometer Data Begins

# Reshape to Proper Matrix Format
d309 = reshape(d309[1], 110, 4);
g870 = reshape(g870[1], 36, 15);
rt50 = reshape(rt50[1], 60, 4);
t8h203 = reshape(t8h203[1], 81, 48);
a452372 = reshape(a452372[1], 137, 18);
a452374 = reshape(a452374[1], 137, 10);
a530872 = reshape(a530872[1], 137, 18);
a530874 = reshape(a530874[1], 137, 12);
a530875 = reshape(a530875[1], 128, 37);

n = 90; # Fixed point dimension
omax = 450000; # Upper bound on the odometer reading

rt=zeros(n,1);
nt=copy(rt);
rc=copy(rt);
nc=copy(rt); 

milecnt=zeros(n,1);

function build(input)
    dt = copy(input)
    # Get odometer values at replacement
    ov1=dt[6,:]'
    ov2=dt[9,:]'

    # Get dimensions of the underlying data
    nr = size(dt, 1);
    nc = size(dt, 2);
    #
    dtc= (dt[12:nr,:] .>= ov1) .* (ov1 .> 0) + (dt[12:nr,:] .>= ov2) .* (ov2 .> 0);
    dtx = dt[12:nr,:] + ov1 .* dtc .* (dtc-2) - .5*ov2.*dtc.*(dtc-1);
    dtx = ceil(n*dtx/omax);
    #
    dtc=vcat((dtc[2:nr-11,:]-dtc[1:nr-12,:]), zeros(1, nc));
    mil=(dtx[2:nr-11,:]-dtx[1:nr-12,:]) + dtx[1:nr-12,:].*dtc[1:nr-12,:];
    df = DataFrame(dtc = copy(vec(dtc[2:nr-11,:])), dtx = copy(vec(dtx[2:nr-11,:])), mil = copy(vec(mil[:,:])));
    return df
end

inputDataset = build(g870);
for i in [rt50, t8h203, a530875 ]
    inputDataset = vcat(inputDataset, build(i))
end

Estimating the Rust Model

m = RustModel()

rd = RustData()
rd.endog = inputDataset[1]
rd.exog = inputDataset[2]

pi_1 = sum( [x == 0 for x in inputDataset[3] ] )/length(inputDataset[3])
pi_2 = sum( [x == 1 for x in inputDataset[3] ] )/length(inputDataset[3])
pi_3 = 1 - pi_1 - pi_2

m.pi = [ pi_1, pi_2, pi_3 ]

@time ll( m, rd )

218.624701 seconds (2.45 G allocations: 217.564 GB, 8.66% gc time)

Results of Optimization Algorithm
 * Algorithm: Nelder-Mead
 * Starting Point: [0.01,4.0]
 * Minimizer: [2.6274875140792378,9.758217072326453]
 * Minimum: 3.002501e+02
 * Iterations: 1000
 * Convergence: false
   *  √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: false
   * Reached Maximum Number of Iterations: true
 * Objective Function Calls: 1017

Calculating Standard Errors

The derivation of the standard errors depends on your choice of $u$ , so I will only derive them for the linear case. The variance of a maximum likelihood estimator is the inverse of the information matrix, where the information matrix is given by

$I(\theta) = - \mathbb{E} \left[ \frac{\partial^2 \ln L(\theta)}{\partial \theta \partial \theta^{\prime}}\right]$

We must therefore compute this matrix in order to get our standard errors. Actually, we can make a simplification by noting that the information identity requires the covariance of the scores is equal to the negative of the Hessian at the true parameter values (see Train 2009 for a derivation of this property). The observations score is the derivative of the log choice probability and given our functional form above, it is straightforward to see that

$\frac{\partial \ln P(d\ \vert x, \theta)}{\partial \theta} = \left(1 - P(d\ \vert x, \theta) \right)\frac{\partial \tilde{v}(x,0) - \tilde{v}(x,1) }{\partial \theta}$

(remember that $\tilde{v}(x,d) = u(x,d) + \beta EV(x)$ ). For both $d \in \{0,1\}$ it is easy to derive $\frac{\partial u}{\partial \theta}$ , so it remains to calculate $\frac{\partial EV(x)}{\partial \theta}$ . We can calculate this derivative using the implicit function theorem, which states that if $F(x_0, y_0) = 0$ then there exists $y = f(x)$ such that

$f^{\prime}(x_0) = - F_y^{-1}(x_0, y_0)F_x(x_0,y_0)$

In our case the operator is $(I - T)(\theta, EV)$ (because the fixed point requires $(I - T)(\theta, EV) = 0$ ). Rust calls the derivative of $latex(I – T)$ with respect to $EV$ $(I - T^{\prime})$ , and I follow his notation. Applying the theorem shows that

$EV^{\prime}(\theta) = -(I - T^{\prime})^{-1} \cdot - T^{\prime}(\theta, EV(\theta)) = (I - T^{\prime})^{-1}T_{\theta}(\theta, EV)$

where $T_{\theta}$ is the partial derivative with respect to $\theta$ . We have an expression for $T$ and so we find analytic expressions for each piece and combine them. Remember that $T(EV) = P \cdot \ln\left\lbrace \sum_d \exp\left(u(x,d) + \beta EV\right) \right\rbrace$ and in matrix notation we can write that as

$T(EV) = \left[\begin{matrix}\pi_0 & \pi_1 & \pi_2 & 0 & 0 & \cdots & 0 & 0\\0 & \pi_0 & \pi_1 & \pi_2 & 0 & \cdots & 0 & 0\\\vdots & & & & \ddots & & & \vdots\\0 & 0 & 0 & 0 & 0 & \cdots & \pi_0 & 1 - \pi_0 \\0 & 0 & 0 & 0 & 0 & \cdots & 0 & 1 \\\end{matrix}\right]\left[\begin{matrix}\ln\left\lbrace \exp\left(u(0,d) + \beta EV(0)\right) + \exp\left(u(0,d) + \beta EV(0)\right) \right\rbrace\\\ln\left\lbrace \exp\left(u(1,d) + \beta EV(1)\right) + \exp\left(u(1,d) + \beta EV(0)\right) \right\rbrace\\\vdots\\\ln\left\lbrace \exp\left(u(K,d) + \beta EV(K)\right) + \exp\left(u(K,d) + \beta EV(0)\right) \right\rbrace\\\end{matrix}\right]$

Let $P(d\ \vert x, \theta) = p_{dx}$ . Then the derivative of this expression with respect to $EV(0),...,EV(K)$ is given by

$T^{\prime}(EV) = \beta \cdot \left[\begin{matrix}\pi_0 p_{10} + \pi_1 p_{11} + \pi_2 p_{12} & \pi_1 p_{01} & \pi_2 p_{02} & 0 & 0 & \cdots & 0 & 0\\\pi_0 p_{11} + \pi_1 p_{12} + \pi_2 p_{13} & \pi_0 p_{01} & \pi_1 p_{02} & \pi_2 p_{03} & 0 & \cdots & 0 & 0\\\vdots & & & & \ddots & & & \vdots\\p_{1K} & 0 & 0 & 0 & 0 & \cdots & 0 & p_{0K}\\\end{matrix}\right]$

If you stare at the above matrix you will notice that the rows sum to $1$ , i.e. it is a transition probability matrix. This makes computing it easier, as columns $2$ through $K$ are just the element-wise product of the transition probability matrix and the choice probability vector. The first column is then just $1$ one minus the sum of the other $K-1$ columns.

The final piece we need is $T_{\theta}$ , which is straightforward to compute. Remember that $\theta = (RC, \theta_{11})$ so $T_{\theta} = \left(T_{RC}, T_{\theta_{11}} \right)$ has components

$T_{RC} = - P \cdot P(1\ \vert x, \theta)$

$T_{\theta_{11}} = P \cdot \left(-x\ P(0\ \vert x, \theta)\right)$

Combining these expressions into a single matrix we can calculate the score for each observation and calculate their covariance to estimate the negative Hessian. The square-root of the diagonal elements of the inverse of the Hessian are our standard errors.

m.params = ot.minimizer
tmpP = choice_p(m)
byOb = tmpP[rd.exog,:]
 
T = length(rd.exog)
 
tmpT2 = transition_probs(m)[:,2:end] .* tmpP[2:end,:]'
 
dR = -(1-transition_probs(m) * tmpP)
dTheta = -(transition_probs(m)*(1:1:90)*-.001) .* tmpP
 
dEV = inv(eye(m.K) - m.beta * hcat(1 - sum(tmpT2,2), tmpT2)) * hcat(dTheta, dR)
 
# Derivative of utility with respect to parameters
tmp = -(1 - byOb .* (rd.endog .== 0) .- (1-byOb) .* (rd.endog .== 1))
score = hcat( -.001*rd.exog, -ones(T, 1)).*tmp
 
# Add the derivative of the difference in Expected Value
score .+= m.beta*(dEV[1,:]' .* ones(T,1) .- dEV[rd.exog, :]) .*tmp

# Calculate inverse of the covariance
H = inv(score'score)
se = sqrt(diag(H))
 
hcat( ot.minimizer, se)

2×2 Array{Float64,2}:
 2.62749  0.616073
 9.75822  1.22672

16 thoughts on “Nested Fixed Point Algorithm”

Eliza da Silva Gomes says:

November 11, 2018 at 10:00 pm

When replacement is chosen, why do you take only the first element of EV (EV[1])?

LikeLike

1. ponde001 says:
  
  November 12, 2018 at 3:40 pm
  
  When the agent chooses to replace the engine, the state is reset to 0 and the expected value of being in state 0 is given by the first element of EV. So the agent is chooses between two values when in state x: u(0,x) + beta*EV(0, x) (the value of not replacing), and u(1,0) + beta*EV(1,0) (the value of replacing). Because EV(1,x) = EV(0, 0), we can represent EV by a single vector (rather than a vector for each choice), and just use the first value of EV whenever the decision to replace is made.
  
  LikeLike
  
  1. Eliza da Silva Gomes says:
    
    November 12, 2018 at 4:39 pm
    
    Thank you very much
    
    LikeLike
Eliza da Silva Gomes says:

November 15, 2018 at 12:21 pm

Can I make you another question? In order to sample the first state, we usually use a stationary distribution, to which we arrive through the transition probabilities. However, in this case, the stationary probabilities are just (0, 0, …, 0, 1). So, when you simulate, do you just assume that the realized states start from x0?

LikeLike

1. ponde001 says:
  
  November 15, 2018 at 6:28 pm
  
  I think it makes more sense to start from the stationary distribution of the controlled process. The stationary probabilities are (0,0,…,1) because engine mileage always increases for the uncontrolled process, and so the stationary distribution will put all the mass on the upper bound. But if you consider the transitions of the controlled process, the process is almost never at the upper bound. For instance
  
  # Choice probabilities
  cp = 1 – choice_p(m);
  # Controlled system transition matrix
  Π = transition_probs( m ) .* (1-pps) + hcat(ones(500,1), zeros(500, 499)) .* pps;
  # Stationary Distribution
  statdst = inv((eye(m.K) – Π + ones(m.K,m.K))’)*ones(m.K)
  
  The expected state of this system is ~ 121. However, for many of my applications, I simulate from the last observed state because I am studying how the state evolves from that point. So where I start the simulation typically depends on the problem I’m studying. Hope this answers your questions.
  
  LikeLike
  
  1. Eliza da Silva Gomes says:
    
    November 16, 2018 at 8:18 pm
    
    Thank you very much, Mark. I assume that, by pps, you mean cp, and that hcat(ones(500,1), zeros(500, 499)) should be hcat(ones(90,1), zeros(90, 89)), right?
    
    LikeLike
  2. ponde001 says:
    
    November 16, 2018 at 8:28 pm
    
    Yep, sorry about that. It should be:
    
    Π = transition_probs( m ) .* (1-cp) + hcat(ones(m.K), zeros(m.K, m.K-1)) .* cp;
    
    Thanks for catching that.
    
    LikeLike
Eliza da Silva Gomes says:

April 1, 2019 at 5:07 pm

Dear Mark,

hi again. In line 10 of your last code chunk, you have an element-by-element product by the choice probabilities, as Rust does in his manual. Why isn’t it -0.001*transition_probs(m)*(tmpP.*(1:1:90))? This which would result in a vector with the following lines:
p(0|x1)*x1*f(x1|x1,0) + p(0|x2)*x2*f(x2|x1,0) + p(0|x3)*x3*f(x3|x1,0)
p(0|x2)*x2*f(x2|x2,0) + p(0|x3)*x3*f(x3|x2,0) + p(0|x4)*x4*f(x4|x2,0)
.
.
.

Instead, in your and Rust’s code, the vector is
p(0|x1) * [ x1*f(x1|x1,0) + x2*f(x2|x1,0) + x3*f(x3|x1,0) ]
p(0|x2)* [ x2*f(x2|x2,0) + x3*f(x3|x2,0) + x4*f(x4|x2,0) ]
.
.
.

In line 9, I agree with dR, where you have a linear combination of the columns of the transition matrix weighted by the choice probabilities.

Thank you,
Eliza

LikeLike

1. ponde001 says:
  
  April 1, 2019 at 7:02 pm
  
  The code (or the equations) could be wrong. It’s been awhile since I looked at it. Have you tried taking the numerical derivative of the contraction mapping and comparing that result to the analytic derivative? That would be an easy way to check which is correct.
  
  LikeLike
  
Matt says:

August 1, 2019 at 4:20 pm

Hello –

Is the code associated with this post available as a direct download somewhere?

Thanks for putting this together.

Matt

LikeLike

Karuna Wu says:

June 9, 2021 at 3:59 am

Dear Mark.
Thank you for your code sharing.
Except for some minor changes in julia syntax (mostly on elementwise calculation in array), it worked quite well in general.
However, the last section on standard errors is problematic for me.

First, in line 18 for the section, doesn’t one need to multiply *m.beta* given form of v=u+\beta*EV?
Second, I ran the code with just simple changes for elementwise syntax, but it returns much smaller standard errors compared to the original: (0.0121674,0.187222). I have tackled this for a while, however cannot find why such huge differences are made.

LikeLike

1. ponde001 says:
  
  June 9, 2021 at 2:09 pm
  
  Thanks for catching this. Unfortunately, most of the code was written under Julia 0.6, so a lot of the syntax needs to be updated to be compatible with Julia 1.0. I updated the code block for three mistakes. First, in line 16, I wasn’t multiplying the score by (1 – P(d|x)). Second, the derivatives for theta and RC were switching in line 16. Finally, I added in m.beta following your suggestion. Once I made these adjustments I get the correct standard errors. I really appreciate the feedback. Let me know if you find anything else.
  
  LikeLike
  
Matthew says:

November 11, 2021 at 5:21 pm

Could I ask what does &gt mean in John Rust’s Bus Engine Replacement Data section?

LikeLike

1. ponde001 says:
  
  November 11, 2021 at 6:23 pm
  
  Looks like the webpage converted the “greater than” sign to that string of text. Should be fixed now. Thanks for the heads up.
  
  LikeLike
  
saryus3 says:

December 9, 2021 at 5:11 am

Thanks for the great code and explanations! In the last section where the standard errors are calculated I was wondering why tmpP is not a (K times 1)-vector but seems to have a higher dimension in the second component, as you write tmpP[2:end,:] instead of tmpP[2:end]

LikeLike

1. ponde001 says:
  
  December 9, 2021 at 1:44 pm
  
  I’m fairly sure that it is Kx1 and Julia is just letting me get away with sloppy coding. The transition probabilities are in a matrix, so that elementwise multiplication wouldn’t make much sense if tmpP was multi-dimensional. When I have time I will check it and update as necessary. Thanks for pointing this out.
  
  LikeLike