Recall that the algorithm of original policy iterations are
(1) Set a grid consisting of k and k’ columnwise and rowwise respectively.
(2) Calculate utility for consumption as U using the grid matrix above.
(3) Starting from a certain v, update v1=U+beta*v’ so as to maximize v1.
(4) Repeat (3) by setting v=v1 for many times or until some criterion is met.
(5) Find the final corresponding value of k as k’ according to the maximum value.
together with
(a) Find the corresponding k’ from (3) in value function iterations.
(b) Calculate utility from k and k’ as r.
(c) Starting from a certain vp, update vp1=U+beta*vp where vp=(1/(1-beta))*r.
(d) Repeat the whole thing by setting vp=vp1 until some criterion is met.
where we calculate exact amount of change in value function for the change in policy
function by inverting linearly in the form of vp=(1/(1-beta))*r to the next.
Now we consider its modified version.
The modified algorithm is:
(c’) Starting from a certain vp, update vp1=U+beta*vp where vp is updated directly by:
From Vi_1=0
Update Vi=r+beta*Vi_1 until Vi and Vi_1 are almost the same.
Set the final Vi to Vp.
We here do not use linear inversion as in policy iterations.
That is a kind of nonlinear solving. Due to the convergence criteria, it may take
more iterations for big loop than the case of original policy iterations. Of course,
there are many iterations to get the conversion to get each Vp in inner loop.