We now develop in detail one class of learning methods for function approximation in
value prediction, those based on gradient descent. Gradient-descent methods are among
transpose), and is a smooth differentiable function of for all . For now,
let us assume that on each step, t, we observe a new example . These
states, or even all the examples, exactly correct. In addition, we must generalize to all the
other states that have not appeared in examples.
most reduce the error on that example: