next up previous contents
Next: Learning algorithm for the Up: Optimising the cost function Previous: Finding optimal for Dirichlet   Contents

Finding optimal $ q(\boldsymbol {\theta })$ for Gaussian parameters

As an example of Gaussian parameters we shall consider $ m_k(i)$ and $ v_k(i)$. All the others are handled in essentially the same way except that there are no weights needed for different states.

To simplify the notation, all the indices from $ m_k(i)$ and $ v_k(i)$ are dropped out for the remainder of this section. The relevant terms of the cost function are now, up to an additive constant

\begin{displaymath}\begin{split}C(m, v) &= \sum_{t=1}^T q(M_t = i) \left(\overli...
...\overline{v}_{v}) - \frac{1}{2} \log \widetilde{v}. \end{split}\end{displaymath} (6.18)

Let us denote $ \sigma^2_{\text{eff}} = \exp(2\overline{v} - 2 \widetilde{v})$, $ \sigma^2_{m, \text{eff}} = \exp(2\overline{v}_m - 2 \widetilde{v}_m)$ and $ \sigma^2_{v, \text{eff}} = \exp(2\overline{v}_v - 2 \widetilde{v}_v)$.

The derivative of this expression with respect to $ \widetilde{m}$ is easy to evaluate

$\displaystyle \frac{\partial C}{\partial \widetilde{m}} = \sum_{t=1}^T q(M_t = ...
...text{eff}}} + \frac{1}{2 \sigma^2_{m, \text{eff}}} - \frac{1}{2 \widetilde{m}}.$ (6.19)

Setting this to zero gives

$\displaystyle \widetilde{m} = \left( \sum_{t=1}^T q(M_t = i) \frac{1}{\sigma^2_{\text{eff}}} + \frac{1}{\sigma^2_{m, \text{eff}}} \right)^{-1}.$ (6.20)

The derivative with respect to $ \overline{m}$ is

$\displaystyle \frac{\partial C}{\partial \overline{m}} = \sum_{t=1}^T q(M_t = i...
...c{1}{2 \sigma^2_{m, \text{eff}}} \left[ \overline{m} - \overline{m}_{m} \right]$ (6.21)

which has a zero at

$\displaystyle \overline{m} = \left[ \sum_{t=1}^T q(M_t = i) \frac{1}{2 \sigma^2...
...) + \frac{1}{2 \sigma^2_{m, \text{eff}}} \overline{m}_{m} \right] \widetilde{m}$ (6.22)

where $ \widetilde{m}$ is given by Equation (6.20).

The solutions for parameters of $ q(m)$ are exact. The true posterior for these parameters is also Gaussian so the approximation is equal to it. This is not the case for the parameters of $ q(v)$. The true posterior for $ v$ is not Gaussian. The best Gaussian approximation with respect to the chosen criterion can still be found by solving the zero of the derivative of the cost function with respect to the parameters of $ q(v)$. This is done using Newton's iteration.

The derivatives with respect to $ \overline{v}$ and $ \widetilde{v}$ are

$\displaystyle \frac{\partial C}{\partial \overline{v}}$ $\displaystyle = \sum_{t=1}^T q(M_t = i) \left(1 - \left[(x(t) - \overline{m})^2...
...e{v}) \right) + \frac{\overline{v} - \overline{m}_{v}}{\sigma^2_{v,\text{eff}}}$ (6.23)
$\displaystyle \frac{\partial C}{\partial \widetilde{v}}$ $\displaystyle = \sum_{t=1}^T q(M_t = i) \left[(x(t) - \overline{m})^2 + \wideti...
...overline{v}) + \frac{1}{2 \sigma^2_{v,\text{eff}}} + \frac{1}{2 \widetilde{v}}.$ (6.24)

These are set to zero and solved with Newton's iteration.


next up previous contents
Next: Learning algorithm for the Up: Optimising the cost function Previous: Finding optimal for Dirichlet   Contents
Antti Honkela 2001-05-30