Updating the hidden states

Next: Learning procedure Up: Optimising the cost function Previous: Updating the network weights Contents

Updating the hidden states

The basic setting for updating the hidden states is the same as above for the network weights. The correlations between consecutive states cause some changes to the formulas and require new ones for adaptation of the correlation coefficients. All the feedforward computations use the marginal variances $\widetilde{s}_k(t)$ which are not actual variational parameters. This affects the derivatives with respect to the other parameters of the state distribution. Let us use the notation $C_p(\widetilde{s}_k(t))$ to mean that the part of the cost function is considered to be a function of the intermediate variables $\widetilde{s}_k(1), \ldots, \widetilde{s}_k(t)$ in addition to the variational parameters. This and Equation (5.48) yield following rules for evaluating the derivatives of the true cost function:

$\displaystyle \frac{\partial C}{\partial \ensuremath{\overset{\raisebox{-0.3ex}... ...aisebox{-0.3ex}[0.5ex][0ex]{\ensuremath{\scriptscriptstyle \,\circ}}}{s}}_k(t)}$

(6.35)

$\begin{displaymath}\begin{split}\frac{\partial C}{\partial \breve{s}_k(t-1, t)} ... ...{s}_k(t)} \breve{s}_k(t-1, t) \widetilde{s}_k(t-1). \end{split}\end{displaymath}$

(6.36)

The term $\partial C_p(\widetilde{s}_k(t)) / \partial \widetilde{s}_k(t)$ in the above equations cannot be evaluated directly, but requires again the use of new intermediate variables. This leads to the recursive formula

$\begin{displaymath}\begin{split}\frac{\partial C_p(\widetilde{s}_k(t))}{\partial... ...artial \widetilde{s}_k(t+1)} \breve{s}_k^2(t, t+1). \end{split}\end{displaymath}$

(6.37)

The terms $\partial C_p(\widetilde{s}_k(t+1)) / \partial \widetilde{s}_k(t)$ are now the ones that can be evaluated with the backward computations through the MLPs as usual.

The term $\partial C_p(\widetilde{s}_k(t)) / \partial \breve{s}_k(t-1, t)$ is easy to evaluate from Equation (6.33), and it gives

$\displaystyle \frac{\partial C_p(\widetilde{s}_k(t))}{\partial \breve{s}_k(t-1,... ...(t-1)} \widetilde{s}_k(t-1) \exp(2 \widetilde{v}_{m_k} - 2 \overline{v}_{m_k}).$

(6.38)

Equations (6.38) and (6.40) yield a fixed point update rule for $\breve{s}_k(t-1, t)$ :

$\displaystyle \breve{s}_k(t-1, t) = \frac{\partial g_k(\mathbf{s}(t-1))} {\part... ...ac{\partial C_p(\widetilde{s}_k(t))}{\partial \widetilde{s}_k(t)} \right)^{-1}.$

(6.39)

The result depends, for instance, on $\breve{s}_k(t, t+1)$ through Equation (6.39), so the updates must be done in the order starting from the last and proceeding backward in time.

The fixed point update rule of the variances $\ensuremath{\overset{\raisebox{-0.3ex}[0.5ex][0ex]{\ensuremath{\scriptscriptstyle \,\circ}}}{s}}_k(t)$ can be solved from Equation (6.37):

$\displaystyle \ensuremath{\overset{\raisebox{-0.3ex}[0.5ex][0ex]{\ensuremath{\s... ...rac{\partial C_p(\widetilde{s}_k(t))}{\partial \widetilde{s}_k(t)}\right)^{-1}.$

(6.40)

The update rule for the means is similar to that of the weights in Equation (6.36) but it includes a correction which tries to compensate the simultaneous updates of the sources. The correction is explained in detail in [60].

Next: Learning procedure Up: Optimising the cost function Previous: Updating the network weights Contents

Antti Honkela 2001-05-30