next up previous contents
Next: Updating the hidden states Up: Optimising the cost function Previous: Optimising the cost function   Contents

Updating the network weights

Assume $ \theta_i$ is a weight in one of the MLP networks and we have evaluated the partial derivatives $ \partial C_p / \partial
\overline{\theta}_i$ and $ \partial C_p / \partial \widetilde{\theta}_i$. The variance $ \widetilde{\theta}_i$ is easy to update with a fixed point update rule derived by setting the derivative to zero

$\displaystyle 0 = \frac{\partial C}{\partial \widetilde{\theta}_i} = \frac{\par...
...}_i = \left( 2 \frac{\partial C_p}{\partial \widetilde{\theta}_i} \right)^{-1}.$ (6.32)

By looking at the form of the cost function for Gaussian terms, we can find an approximation for the second derivatives with respect to the means as [34]

$\displaystyle \frac{\partial^2 C}{\partial \overline{\theta}_i^2} \approx 2 \frac{\partial C_p}{\partial \widetilde{\theta}_i} = \frac{1}{\widetilde{\theta}_i}.$ (6.33)

This allows using an approximate Newton's iteration to update the mean

$\displaystyle \overline{\theta}_i \leftarrow \overline{\theta}_i - \frac{\parti...
...heta}_i - \frac{\partial C}{\partial \overline{\theta}_i} \widetilde{\theta}_i.$ (6.34)

There are some minor corrections to these update rules as explained in [34].



Antti Honkela 2001-05-30