Updating the network weights

Next: Updating the hidden states Up: Optimising the cost function Previous: Optimising the cost function Contents

Updating the network weights

Assume $\theta_i$ is a weight in one of the MLP networks and we have evaluated the partial derivatives $\partial C_p / \partial \overline{\theta}_i$ and $\partial C_p / \partial \widetilde{\theta}_i$ . The variance $\widetilde{\theta}_i$ is easy to update with a fixed point update rule derived by setting the derivative to zero

$\displaystyle 0 = \frac{\partial C}{\partial \widetilde{\theta}_i} = \frac{\par... ...}_i = \left( 2 \frac{\partial C_p}{\partial \widetilde{\theta}_i} \right)^{-1}.$

(6.32)

By looking at the form of the cost function for Gaussian terms, we can find an approximation for the second derivatives with respect to the means as [34]

$\displaystyle \frac{\partial^2 C}{\partial \overline{\theta}_i^2} \approx 2 \frac{\partial C_p}{\partial \widetilde{\theta}_i} = \frac{1}{\widetilde{\theta}_i}.$

(6.33)

This allows using an approximate Newton's iteration to update the mean

$\displaystyle \overline{\theta}_i \leftarrow \overline{\theta}_i - \frac{\parti... ...heta}_i - \frac{\partial C}{\partial \overline{\theta}_i} \widetilde{\theta}_i.$

(6.34)

There are some minor corrections to these update rules as explained in [34].

Antti Honkela 2001-05-30