Value function approximation and collocation method

Dynamic programming is basically solved by:

v1=U+beta*v

so as to maximize v1 where v1 and v are both n x 1 vector. We find k and k’ and then

update v to v1. We are considering a case with a single policy

function. We have to consider n x 1 vector of values in each step.

If we approximate the curve of value function v1 and v, we can reduce the number of

estimates in each step. That’s the main idea of this collocation method.

Usually we approximate the value function in a polynomial manner.

Recall that the algorithm of modified policy iterations is

(1) Set a grid consisting of k and k' columnwise and rowwise respectively.

(2) Calculate utility for consumption as U using the grid matrix above.

(3) Starting from a certain v, update v1=U+beta*v' so as to maximize v1.

(a) Find the corresponding k’ from (3) in value function iterations.

(b) Calculate utility from k and k’ as r.

vi=r+beta*vi_1

up until vi and vi_1 are almost the same

and set the final vi to vp.

(d) Repeat the whole thing by setting vp=vp1 until some criterion is met.

(4) Repeat (3) by setting v=v1 for many times.

(5) Find the final corresponding value of k as k' according to the maximum value.

as in 0912 and 0913. When we calculate vp by direct inversion, it is going to regular

policy iterations.

In (c ), we approximate vp and find it in terms of coefficients of polynomials so as to

satisfy vp=U+beta*vp when vp1=vp. That is, we can rewrite it by

poly(x)=U+beta*poly(x)

where poly(x) is some polynomial function of x.

Recall that we can draw the curve of value function for x versus value function.

If the order of polynomials is porder, we can reduce the number of estimates from

n to porder+1 including the intercept in each step. It may speed up the calculation

when we consider many number of points on the grid.