And the final policy found
which the gambler reaches his goal, when it is +1. The state-value function then gives the
successive sweeps of value iteration, and the final policy found, for
the case of .
Why does the optimal policy for the gambler's problem have such a curious form? In particular,
for capital of 50 it bets it all on one flip, but for capital of 51 it does not. Why is this a good
termination with capital of 0 and 100 dollars, giving them values of 0 and 1 respectively. Show
your results graphically as in Figure 4.6. Are your results
stable as ?
Next:4.5 Asynchronous Dynamic ProgrammingUp:4 Dynamic ProgrammingPrevious:4.3
Policy Iteration