of the derivative matrices described above. This method computes the
global scaling matrix Ak by recursively applying the matrix inversion
lemma. This procedure provides results that are mathematically identical
to conventional matrix inversion procedures, regardless of the degree of
decoupling employed. In addition, it can be employed for training of any
form of network, static or dynamic, as well as for the multistream
procedure. On the other hand, we have found that this method often
requires the use of double-precision arithmetic to produce results that
are statistically identical to EKF implementations based on explicit
matrix inversion methods.
The second class, developed by Plumer [17], treats each output
component individually in an iterative procedure. This sequential update
procedure accumulates the weight vector update as each output compo-nent
is processed, and only applies the weight vector update after all output
signals have been processed. The error covariance matrix is updated in a
sequential fashion. Plumer’s sequential-update form of EKF turns out to
be exactly equivalent to the batch form of GFKF given above in which all
output signals are processed simultaneously. However, for decoupled EKF
training, it turns out that sequential updates only approximate the
updates obtained via the simultaneous DEKF recursion of Eqs.
(2.12)–(2.15), though this has been reported to not pose any problems
during training.
Note that the scalar rk;l is the lth diagonal element of the
measurement covariance matrix Rk in the simultaneous form of DEKF, that
the scalar xk;l is the lth error signal, and that the vector hi k;lis
the lth column of the augmented derivative matrix Hi k. After all output
signals of all training streams have been processed, the weight vectors
and error covariance matrices for all weight groups are updated by
2.6.4.1 |
Without Artificial Process Noise
|
described the use of square-root filtering as a numerically stable,
alternative method to performing the approximate error covariance matrix
update given by the Riccati equation (2.6). The square-root filter
methods are well known in the signal processing community [19], and were
developed so as to guarantee that the positive-definiteness of the
matrix is maintained throughout training. However, this insurance is
accompanied by increased computational complexity. Below, we summarize
the square-root formula-tion for the case of no artificial process
noise, with proper treatment of the EKF learning rate as given in Eq.
(2.7) (we again assume Sk ¼ IÞ. The square-root covariance filter update
is based on the matrix factorization lemma, which states that for any
pair of J � K matrices B1 and B2, with J � K, the relation B1B1 T ¼ B2B2
T holds if and only if there exists a unitary matrix Y such that B2 ¼
B1Y. With this in mind, the covariance update equations (2.3) and (2.6)
can be written in matrix form as