Speedup algorithm: Modified policy iterations

The algorithm of modified policy iterations is

(1) Set a grid consisting of k and k' columnwise and rowwise respectively.

(2) Calculate utility for consumption as U using the grid matrix above.

(3) Starting from a certain v, update v1=U+beta*v' so as to maximize v1.

(a) Find the corresponding k’ from (3) in value function iterations.

(b) Calculate utility from k and k’ as r.

vi=r+beta*vi_1

up until vi and vi_1 are almost the same

and set the final vi to vp.

(d) Repeat the whole thing by setting vp=vp1 until some criterion is met.

(4) Repeat (3) by setting v=v1 for many times.

(5) Find the final corresponding value of k as k' according to the maximum value.

The main difference is by direct inversion or by numerical iterations.

Theoretically, they are the same. However, the modified version takes more iterations

due to errors in convergence criteria.

Behavior of value functions for original policy iterations

Behavior of value functions for modified policy iterations

where the first two are different from the former method and the rest

is involved to a little bit more iterations to final position. In both, they shift from up to

bottom once and then going up to the final position gradually.

The number of unmatched indices in each case

t= 1 diff #= 99

t= 2 diff #= 100

t= 3 diff #= 100

t= 4 diff #= 99

t= 5 diff #= 86

t= 6 diff #= 29

t= 7 diff #= 12

t= 8 diff #= 1

t= 9 diff #= 3

t= 9

t= 1 diff #= 99

t= 2 diff #= 100

t= 3 diff #= 100

t= 4 diff #= 99

t= 5 diff #= 86

t= 6 diff #= 29

t= 7 diff #= 12

t= 8 diff #= 1

t= 9 diff #= 3

t= 10 diff #= 2

t= 11 diff #= 0

t= 12 diff #= 0

t= 12　　

where the first case by policy iterations stops earlier than t=10 while the second case

by modified policy iterations stops later.