As in 1123, we implement both value function approximation and policy function
interpolation at the same time. Especially, value function approximation is
important for quick calculation. Bad approximate will lead to no convergence
or it will take very long time to finish.
Value function approximation -> speed up convergence and no improvement
Policy function interpolation -> improve the resulting values