Bibliography

Next: About this document ... Up: Nonlinear Switching State-Space Models Previous: Probabilistic computations for MLP Contents

Bibliography

1: D. K. Arrowsmith and C. M. Place.
An Introduction to Dynamical Systems.
Cambridge University Press, Cambridge, 1990.
2: David Barber and Christopher M. Bishop.
Ensemble learning for multi-layer networks.
In Michael I. Jordan, Michael J. Kearns, and Sara A. Solla, editors, Advances in Neural Information Processing Systems 10, NIPS*97, pages 395-401, Denver, Colorado, USA, Dec. 1-6, 1997, 1998. The MIT Press.
3: Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss.
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains.
The Annals of Mathematical Statistics, 41(1):164-171, 1970.
4: J. M. Bernardo.
Psi (digamma) function.
Applied Statistics, 25(3):315-317, 1976.
5: Christopher Bishop.
Neural Networks for Pattern Recognition.
Oxford University Press, Oxford, 1995.
6: Christopher Bishop.
Latent variable models.
In Jordan [29], pages 371-403.
7: Thomas Briegel and Volker Tresp.
Fisher scoring and a mixture of modes approach for approximate inference and learning in nonlinear state space models.
In Michael S. Kearns, Sara A. Solla, and David A. Cohn, editors, Advances in Neural Information Processing Systems 11, NIPS*98, pages 403-409, Denver, Colorado, USA, Nov. 30-Dec. 5, 1998, 1999. The MIT Press.
8: Martin Casdagli.
Nonlinear prediction of chaotic time series.
Physica D, 35(3):335-356, 1989.
9: Martin Casdagli, Stephen Eubank, J. Doyne Farmer, and John Gibson.
State space reconstruction in the presence of noise.
Physica D, 51(1-3):52-98, 1991.
10: Vladimir Cherkassky and Filip Mulier.
Learning from Data: Concepts, Theory, and Methods.
John Wiley & Sons, New York, 1998.
11: Y. J. Chung and C. K. Un.
An MLP/HMM hybrid model using nonlinear predictors.
Speech Communication, 19(4):307-316, 1996.
12: Thomas M. Cover and Joy A. Thomas.
Elements of Information Theory.
Wiley series in telecommunications. John Wiley & Sons, New York, 1991.
13: A. P. Dempster, N. M. Laird, and D. B. Rubin.
Maximum likelihood from incomplete data via the EM algorithm.
Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977.
14: J. Doyne Farmer and John J. Sidorowich.
Predicting chaotic time series.
Physical Review Letters, 59(8):845-848, 1987.
15: Ken-ichi Funahashi.
On the approximate realization of continuous mappings by neural networks.
Neural Networks, 2(3):183-192, 1989.
16: Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin.
Bayesian Data Analysis.
Chapman & Hall/CRC, Boca Raton, 1995.
17: Zoubin Ghahramani.
An introduction to hidden Markov models and Bayesian networks.
International Journal of Pattern Recognition and Artificial Intelligence, 15(1):9-42, 2001.
18: Zoubin Ghahramani and Geoffrey E. Hinton.
Variational learning for switching state-space models.
Neural Computation, 12(4):831-864, 2000.
19: Zoubin Ghahramani and Sam T. Roweis.
Learning nonlinear dynamical systems using an EM algorithm.
In Michael S. Kearns, Sara A. Solla, and David A. Cohn, editors, Advances in Neural Information Processing Systems 11, NIPS*98, pages 599-605, Denver, Colorado, USA, Nov. 30-Dec. 5, 1998, 1999. The MIT Press.
20: Monson H. Hayes.
Statistical Digital Signal Processing and Modeling.
John Wiley & Sons, New York, 1996.
21: Simon Haykin.
Neural Networks - A Comprehensive Foundation, 2nd ed.
Prentice-Hall, Englewood Cliffs, 1998.
22: Simon Haykin and Jose Principe.
Making sense of a complex world.
IEEE Signal Processing Magazine, 15(3):66-81, 1998.
23: Geoffrey E. Hinton and Drew van Camp.
Keeping neural networks simple by minimizing the description length of the weights.
In Proceedings of the COLT'93, pages 5-13, Santa Cruz, California, USA, July 26-28, 1993.
24: Sepp Hochreiter and Jürgen Schmidhuber.
Feature extraction through LOCOCODE.
Neural Computation, 11(3):679-714, 1999.
25: Antti Honkela and Juha Karhunen.
An ensemble learning approach to nonlinear independent component analysis.
In Proceedings of the 15th European Conference on Circuit Theory and Design (ECCTD'01), Espoo, Finland, August 28-31, 2001.
To appear.
26: Kurt Hornik, Maxwell Stinchcombe, and Halbert White.
Multilayer feedforward networks are universal approximators.
Neural Networks, 2(5):359-366, 1989.
27: Aapo Hyvärinen, Juha Karhunen, and Erkki Oja.
Independent Component Analysis.
John Wiley & Sons, 2001.
In press.
28: O. L. R. Jacobs.
Introduction to Control Theory.
Oxford University Press, Oxford, second edition, 1993.
29: Michael I. Jordan, editor.
Learning in Graphical Models.
The MIT Press, Cambridge, Massachusetts, 1999.
30: Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul.
An introdution to variational methods for graphical models.
In Jordan [29], pages 105-161.
31: Rudolf E. Kalman.
A new approach to linear filtering and prediction problems.
Transactions of the ASME, Journal of Basic Engineering, 82:35-45, 1960.
32: Edward W. Kamen and Jonathan K. Su.
Introduction to Optimal Estimation.
Springer, London, 1999.
33: Mikko Kurimo.
Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models.
PhD thesis, Helsinki University of Technology, Espoo, 1997.
Published in Acta Polytechnica Scandinavica, Mathematics, Computing and Management in Engineering Series No. 87.
34: Harri Lappalainen and Antti Honkela.
Bayesian nonlinear independent component analysis by multi-layer perceptrons.
In Mark Girolami, editor, Advances in Independent Component Analysis, pages 93-121. Springer, Berlin, 2000.
35: Harri Lappalainen and James W. Miskin.
Ensemble learning.
In Mark Girolami, editor, Advances in Independent Component Analysis, pages 76-92. Springer, Berlin, 2000.
36: Peter M. Lee.
Bayesian Statistics: An Introduction.
Arnold, London, second edition, 1997.
37: David J. C. MacKay.
Bayesian interpolation.
Neural Computation, 4(3):415-447, 1992.
38: David J. C. MacKay.
Developments in probabilistic modelling with neural networks--ensemble learning.
In Neural Networks: Artificial Intelligence and Industrial Applications. Proceedings of the 3rd Annual Symposium on Neural Networks, Nijmegen, Netherlands, 14-15 September 1995, pages 191-198, Berlin, 1995. Springer.
39: David J. C. MacKay.
Ensemble learning for hidden Markov models.
Available from http://wol.ra.phy.cam.ac.uk/mackay/, 1997.
40: David J. C. MacKay.
Choice of basis for Laplace approximation.
Machine Learning, 33(1):77-86, 1998.
41: Peter S. Maybeck.
Stochastic Models, Estimation, and Control, volume 1.
Academic Press, New York, 1979.
42: Peter S. Maybeck.
Stochastic Models, Estimation, and Control, volume 2.
Academic Press, New York, 1982.
43: Kevin Murphy.
Switching Kalman filters.
Technical report, Department of Computer Science, University of California Berkeley, 1998.
44: Radford M. Neal.
Bayesian Learning for Neural Networks, volume 118 of Lecture Notes in Statistics.
Springer, New York, 1996.
45: Radford M. Neal and Geoffrey E. Hinton.
A view of the EM algorithm that justifies incremental, sparse, and other variants.
In Jordan [29], pages 355-368.
46: Jacob Palis, Jr and Welington de Melo.
Geometric Theory of Dynamical Systems.
Springer, New York, 1982.
47: William D. Penny, Richard M. Everson, and Stephen J. Roberts.
Hidden Markov independent component analysis.
In Mark Girolami, editor, Advances in Independent Component Analysis, pages 3-22. Springer, Berlin, 2000.
48: Lawrence R. Rabiner.
A tutorial on hidden Markov models and selected applications in speech recognition.
Proceedings of the IEEE, 77(2):257-286, 1989.
49: S. Neil Rasband.
Chaotic Dynamics of Nonlinear Systems.
John Wiley & Sons, New York, 1990.
50: Sam T. Roweis and Zoubin Ghahramani.
A unifying review of linear Gaussian models.
Neural Computation, 11(2):305-345, 1999.
51: Sam T. Roweis and Zoubin Ghahramani.
An EM algorithm for indentification of nonlinear dynamical systems.
Submitted for publication. Preprint available from http://www.gatsby.ucl.ac.uk/~roweis/publications.html, 2000.
52: Walter Rudin.
Real and Complex Analysis.
McGraw-Hill, Singapore, third edition, 1987.
53: Tim Sauer, James A. Yorke, and Martin Casdagli.
Embedology.
Journal of Statistical Physics, 65(3/4):579-616, 1991.
54: Vesa Siivola.
An adaptive method to achieve speaker independence in a speech recognition system.
Master's thesis, Helsinki University of Technology, Espoo, 1999.
55: Floris Takens.
Detecting strange attractors in turbulence.
In David Rand and Lai-Sang Young, editors, Dynamical systems and turbulence, Warwick 1980, volume 898 of Lecture Notes in Mathematics, pages 366-381. Springer, Berlin, 1981.
56: Edmondo Trentin and Marco Gori.
A survey of hybrid ANN/HMM models for automatic speech recognition.
Neurocomputing, 37:91-126, 2001.
57: Harri Valpola.
Bayesian Ensemble Learning for Nonlinear Factor Analysis.
PhD thesis, Helsinki University of Technology, Espoo, 2000.
Published in Acta Polytechnica Scandinavica, Mathematics and Computing Series No. 108.
58: Harri Valpola.
Unsupervised learning of nonlinear dynamic state-space models.
Technical Report A59, Helsinki University of Technology, Espoo, 2000.
59: Harri Valpola, Xavier Giannakopoulos, Antti Honkela, and Juha Karhunen.
Nonlinear independent component analysis using ensemble learning: Experiments and discussion.
In Petteri Pajunen and Juha Karhunen, editors, Proceedings of the Second International Workshop on Independent Component Analysis and Blind Signal Separation, ICA 2000, pages 351-356, Espoo, Finland, June 19-22, 2000.
60: Harri Valpola and Juha Karhunen.
An unsupervised ensemble learning method for nonlinear dynamic state-space models.
A manuscript to be submitted to a journal, 2001.
61: Christopher S. Wallace and D. M. Boulton.
An information measure for classification.
The Computer Journal, 11(2):185-194, 1968.
62: Hassler Whitney.
Differentiable manifolds.
Annals of Mathematics, 37(3):645-680, 1936.

Antti Honkela 2001-05-30