next up previous contents
Next: Bayesian nonlinear state-space model Up: Bayesian continuous density hidden Previous: The model   Contents

The approximating posterior distribution

The approximating posterior distribution needed in ensemble learning is over all the possible hidden state sequences $ \boldsymbol {M}$ and the parameter values $ \boldsymbol {\theta }$. The approximation is chosen to be of a factorial form

$\displaystyle q(\boldsymbol{M}, \boldsymbol{\theta}) = q(\boldsymbol{M}) q(\boldsymbol{\theta}).$ (5.13)

The approximation $ q(\boldsymbol {M})$ is a discrete distribution and it factorises as

$\displaystyle q(\boldsymbol{M}) = q(M_1) \prod_{t=1}^{T-1} q(M_{t+1} \vert M_{t}).$ (5.14)

The parameters of this distribution are the discrete probabilities $ q(M_1 = i)$ and $ q(M_{t+1} = i \vert M_{t} = j)$.

The distribution $ q(\boldsymbol {\theta })$ is also formed as a product of independent distribution for different parameters. The parameters with Dirichlet priors have posterior approximations of a single Dirichlet distribution like for $ \boldsymbol{\pi}$

$\displaystyle q( \boldsymbol{\pi} ) = \ensuremath{\text{Dirichlet}}( \boldsymbol{\pi};\; \hat{\boldsymbol{\pi}} ),$ (5.15)

or a product of Dirichlet distributions as for $ \mathbf{A}$

$\displaystyle q( \mathbf{A}) = \prod_{i=1}^M \ensuremath{\text{Dirichlet}}( \mathbf{a}_i;\; \hat{\mathbf{a}}_i).$ (5.16)

These will actually be the optimal choices among all possible distributions, assuming the factorisation $ q(\boldsymbol{M}, \boldsymbol{\pi}, \mathbf{A}) =
q(\boldsymbol{M}) q(\boldsymbol{\pi}) q(\mathbf{A})$.

The parameters with Gaussian priors have Gaussian posterior approximations of the form

$\displaystyle q( \theta_i ) = N(\theta_i;\; \overline{\theta_i}, \widetilde{\theta_i}).$ (5.17)

All these parameters are assumed to be independent.


next up previous contents
Next: Bayesian nonlinear state-space model Up: Bayesian continuous density hidden Previous: The model   Contents
Antti Honkela 2001-05-30