Selected publications
 
Novel theoretical and empirical results are useful only to the extent that they are reported to the scientific community. On this page you will find some recent papers and a selection of some of my older publications. The papers are grouped by topic, and within each topic they are arranged in reverse cronological order (i.e. newest first). The topics are also roughly arranged in reverse cronological order: The first topic (causal inference) describes current work whereas the last category (natural image statistics & computational neuroscience) concerns a topic which I do not work on actively any more [although please see our recently appeared book on this topic!].
 
Many of my papers have been widely cited; for citation details see my publications on Google Scholar.
 
 
Causal inference
 
Submitted
 
A. Hyttinen, F. Eberhardt, and P. O. Hoyer
Learning linear cyclic causal models with latent variables
Submitted manuscript (October, 2011).
[
pdf ]
[Gives a thorough account of how to learn linear causal models from experiments. Allows for both feedback cycles and latent variables, and does not rely on faithfulness. The manuscript provides a detailed description of the underlying model and of identifiability results, as well as experimental results of the performance on the method.]
 
2012
 
D. Entner, P. O. Hoyer, and P. Spirtes
Statistical test for consistent estimation of causal effects in linear non-Gaussian models
Proceedings of the 15th International Conference on Artificial Intelligence and Statistics(AISTATS-2012), La Palma, Canary Islands, 2012.
[
pdf ]
[Shows that, in the linear non-Gaussian case, it is possible to identify whether a given subset of possible confounders, when adjusted for, yields a consistent estimate of a causal effect.]
 
2011
 
S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer, and K. Bollen
DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model
Journal of Machine Learning Research 12: 1225-1248, 2011.
[
pdf ]
[Presents a new estimation technique for the model presented by Shimizu et al (2006, below).]
 
A. Hyttinen, F. Eberhardt, and P. O. Hoyer
Noisy-OR models with latent confounding
Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI-2011), Barcelona, Spain, 2011.
[
pdf ]
[Considers discovery of causal models with a noisy-or parametrization. Shows that identification is possible with very similar experimental conditions to the linear case.]
 
2010
 
A. Hyttinen, F. Eberhardt, and P. O. Hoyer
Causal discovery for linear cyclic models with latent variables
Proceedings of the 5th European Workshop on Probabilistic Graphical Models (PGM-2010), Helsinki, Finland, 2010.
[
pdf ]
[Extends the AISTATS-2010 paper in several directions, including showing how to utilize a faithfulness assumption to derive more causal conclusions when the set of experiments is limited.]
 
D. Entner and P. O. Hoyer
On causal discovery from time series data using FCI
Proceedings of the 5th European Workshop on Probabilistic Graphical Models (PGM-2010), Helsinki, Finland, 2010.
[
pdf ]
[Shows how to modify and apply the FCI algorithm to analyze causal relationships in time-series data in the presence of confounding hidden variables.]
 
A. Moneta, D. Entner, P. O. Hoyer, and A. Coad
Causal inference by Independent Component Analysis with applications to micro- and macroeconomic data
Submitted. Available as Jena Economic Research Papers in Economics 2010-031, Friedrich-Schiller-University Jena.
[
link to working paper ]
[Introduces the use of ICA for causal inference in time-series data to the econometrics community, relating the presented methods to earlier methods for constructing SVAR models and showing how to apply the method to both a microeconomic (firm growth data) and a macroeconomic (effects of monetary policy) problem.]
 
D. Janzing, P. O. Hoyer, and Bernhard Schölkopf
Telling cause from effect based on high-dimensional observations
Proceedings of the 27th International Conference on Machine Learning (ICML-2010), Haifa, Israel, 2010.
[
pdf ]
[Introduces a method based on the structure in the covariances among high-dimensional variables to infer causal directions.]
 
F. Eberhardt, P. O. Hoyer and R. Scheines
Combining experiments to discover linear cyclic models with latent variables
Proceedings of the 13th International Conference on Artificial Intelligence and Statistics(AISTATS-2010), Sardinia, Italy, 2010.
[
pdf ]
[Shows how to optimally use experiments to discover the interaction graph in linear feedback models with hidden variables.]
 
A. Hyvärinen, K. Zhang, S. Shimizu, and P. O. Hoyer
Estimation of a Structural Vector Autoregression model using non-Gaussianity
Journal of Machine Learning Research 11: 1709-1731, 2010.
[
pdf ]
[A longer version of the ICML-2008 paper cited below. This paper explains how one can analyze time-series data using the LiNGAM model. The presentation of the theory and the examples are targeted to researchers in machine learning.]
 
2009
 
P. O. Hoyer and A. Hyttinen
Bayesian discovery of linear acyclic causal models
Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI-2009), Montreal, Canada, 2009.
[
pdf ]
[Introduces a Bayesian score-based LiNGAM method, which also properly handles data from mixed Gaussian/non-Gaussian linear acyclic models.]
 
S. Shimizu, P. O. Hoyer, and A. Hyvärinen
Estimation of linear non-Gaussian acyclic models for latent factors
Neurocomputing 72: 2024-2027, 2009.
[
pdf ]
[
Discusses how to apply the LiNGAM framework to latent factors rather than the measured variables.]
 
P. O. Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Schölkopf
Nonlinear causal discovery with additive noise models
Advances in Neural Information Processing Systems 21 (NIPS*2008), pp. 689-696, 2009.
[
pdf ]
[Shows that the
LiNGAM principle can be generalized to nonlinear models with additive noise.]
 
2008
 
P. O. Hoyer, A. Hyvärinen, R. Scheines, P. Spirtes, J. Ramsey, G. Lacerda, and S. Shimizu
Causal discovery of linear acyclic models with arbitrary distributions
Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI-2008), Helsinki, Finland, 2008.
[
pdf ]
[
Discusses distribution-equivalence classes for mixed Gaussian/non-Gaussian models and gives a practical method for estimating these.]
 
G. Lacerda, P. Spirtes, J. Ramsey, and P. O. Hoyer
Discovering cyclic causal models by Independent Components Analysis
Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI-2008), Helsinki, Finland, 2008.
[
pdf ]
[
Generalizes the LiNGAM estimation method to also handle cyclic (i.e. ‘non-recursive’) structural equation models.]
 
A. Hyvärinen, S. Shimizu, and P. O. Hoyer
Causal modelling combining instantaneous and lagged effects: an identifiable
model based on non-Gaussianity
Proceedings of the 25th International Conference on Machine Learning (ICML-2008), pp. 424-431, Helsinki, Finland, 2008.
[
pdf ]
[
Show how to apply the LiNGAM analysis to time-series data, so as to estimate a Structural Vector Autoregressive (SVAR) model using non-Gaussianity.]
 
P. O. Hoyer, S. Shimizu, A. J. Kerminen, and M. Palviainen
Estimation of causal effects using linear non-gaussian causal models with hidden variables
International Journal of Approximate Reasoning 49: 362-378, 2008.
[
pdf ]
[
Extended journal version of the PGM'06 paper, including extensive simulations of the method. Also includes a brief tutorial to using non-gaussianity for causal discovery.]
 
2006
 
P. O. Hoyer, S. Shimizu, and A. J. Kerminen
Estimation of linear, non-gaussian causal models in the presence of confounding latent variables
In Proc. Third European Workshop on Probabilistic Graphical Models (PGM'06), pp. 155-162, Prague, Czech Republic, 2006.
[
pdf ]
[Extends our earlier work on LiNGAM (see the JMLR and UAI-2005 papers below) by allowing hidden variables. In particular, discusses when one can identify the complete model and how to infer the model from data. Also provides a small demonstration of learning a model from data. Complete MATLAB code is available (see the 'code' section below).]

S. Shimizu, P. O. Hoyer, A. Hyvärinen, and A. J. Kerminen
A linear non-gaussian acyclic model for causal discovery
Journal of Machine Learning Research  7: 2003-2030, 2006.
[
pdf ]
[A longer version of the UAI-2005 paper below. Here, we provide a more extensive discussion on the various statistical tests that can be used, and we describe experiments on real data as well. The corresponding MATLAB code package can be found in the 'code' section below.]

P. O. Hoyer, S. Shimizu, A. Hyvärinen, Y. Kano, and A. J. Kerminen
New permutation algorithms for causal discovery using ICA
In Proc. International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2006), pp. 115-122, Charleston, SC, USA, 2006.
[
pdf ]
[One of the main computational problems encountered in
LiNGAM (see UAI paper below) is finding the correct permutations of the estimated ICA basis matrix. In this paper, we give algorithms for solving the permutation problems, allowing the application of LiNGAM to problems involving tens of variables or more.]
 
2005

S. Shimizu, A. Hyvärinen, Y. Kano, and P. O. Hoyer
Discovery of non-gaussian linear causal models using ICA
In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI-2005), pp. 526-533, 2005.
[
pdf ]
[For policy analysis (deciding on actions) it is not enough to model the joint probability distribution of the observed data; one must go further to obtain a causal model. In principle, controlled experiments are the only way of finding such a model. However, if some assumptions hold true, it is sometimes possible to find the causal model based on observational (uncontrolled) data only. In this paper, we show that if the generating model is linear and there are no unobserved confounders, one can estimate all the parameters of the model, and we provide full MATLAB code to perform this estimation (see the 'code' section below). Please also see the new
LiNGAM homepage.]
 
 
 
Non-negative representations

P. O. Hoyer
Non-negative Matrix Factorization with sparseness constraints
Journal of Machine Learning Research  5: 1457-1469, 2004.
[
pdf ]
[This paper argues that sparseness constraints are useful when using NMF. We define a sparseness measure and describe a projection operator capable of enforcing any given sparseness. Several examples of decompositions learned from face- and natural image data are used to illustrate the method. The relationships to other recent extensions of NMF are discussed. (This paper is an extension of the ideas that appeared in the conference paper 'Non-negative sparse coding', below.) A corresponding MATLAB code package can be found in the 'code' section below.]
 
P. O. Hoyer
Non-negative sparse coding
Neural Networks for Signal Processing XII (Proc. IEEE Workshop on Neural Networks for Signal Processing), pp. 557-565
Martigny, Switzerland, 2002.
[
ps | ps.gz | pdf ]
[Introduces Non-negative Sparse Coding (NNSC) as the combination of non-negativity constraints and sparse coding. Also describes an extremely simple algorithm that nevertheless seems to be quite efficient for optimizing the hidden components. The corresponding MATLAB code can be found in the 'code' section of this page. Note: Please see the recent paper 'Non-negative matrix factorization with sparseness constraints' above.]
 
 
 
Feature extraction & pattern recognition
 
A Vicente, P. O. Hoyer, and A. Hyvärinen
Equivalence of some common linear feature extraction techniques for appearance-based object recognition tasks
IEEE Transactions on Pattern Analysis and Machine Intelligence,  29(5): 896-900, 2007.
[
pdf ]
[Shows that ICA features are often equivalent, in terms of classification performance, to the features given by whitening, and discusses how ICA features should be used to improve classification.]


 
 
natural image statistics & computational neuroscience
 
2009
 
A. Hyvärinen, J. Hurri, and Patrik O. Hoyer
Natural Image Statistics: A Probabilistic Approach to Early Computational Vision
Springer Verlag, 2009.
[
website with preprint version | springer | amazon.com | amazon.co.uk ]
[An introductory textbook and research monograph on modelling the statistical structure of natural images.]
 
1999-2003

P. O. Hoyer and A. Hyvärinen
Interpreting neural response variability as Monte Carlo sampling of the posterior
In Advances in Neural Information Processing Systems 15 (NIPS*2002), pp. 277-284, MIT Press, 2003.
[
ps | ps.gz | pdf ]
[Attempts to provide a functional explanation for 'noise' in cortical sensory neurons. The corresponding MATLAB code can be found in the 'code' section of this page.]

P. O. Hoyer
Probabilistic Models of Early Vision
Ph.D. thesis, November 2002.
Computer Science Department, Helsinki University of Technology.
Advisor:
Dr. Aapo Hyvärinen. Supervisor: Prof. Erkki Oja.
[
ps | ps.gz | pdf ]
[Consists of an introductory part which attempts to explain my research to people outside the field, followed by six research articles. These research articles are not part of this file; they can be downloaded separately from this page.]

P. O. Hoyer
Modeling receptive fields with non-negative sparse coding
Neurocomputing 52-54: 547-552 (2003).
[
ps | ps.gz | pdf ]
[Argues for non-negativity constraints in sparse coding and shows how the proposed method applied to ON/OFF-rectified image data yields features resembling simple-cell receptive fields. The corresponding MATLAB code can be found in the 'code' section of this page.]

P. O. Hoyer and A. Hyvärinen
A Multi-Layer Sparse Coding Network Learns Contour Coding from Natural Images
Vision Research  42(12):1593-1605, 2002.
(Note that this paper previously had the title "A Non-Negative Sparse Coding Network
Learns Contour Coding and Integration from Natural Images". The title was changed during revision.)
[
ps | ps.gz | pdf ]
[Applies non-negative sparse coding to the simulated responses of complex cells to natural images. This yields higher-order contour-coding units and end-stopped cells. In addition, it is suggested that contour integration can be understood as top-down inference in the presented model. The corresponding MATLAB code can be found in the 'code' section of this page.]

A. Hyvärinen and P. O. Hoyer
A Two-Layer Sparse Coding Model Learns Simple and Complex Cell Receptive Fields and Topography from Natural Images
Vision Research  41(18):2413-2423, 2001.
[
ps | ps.gz | pdf ]
[Here we show how a two-layer topographic extension of the basic ICA/sparse coding model emerges both V1-like topography and complex-cell properties (in addition to simple-cell classical receptive fields) from natural images. Related code can be found in the 'code' section of this page.]

A. Hyvärinen, P. O. Hoyer and M. Inki
Topographic Independent Component Analysis
Neural Computation  13(7):1527-1558, 2001.
[
ps | ps.gz | pdf ]
[Generalizes independent subspaces by introducing a more advanced extension of ICA. The dependencies of the estimated "independent" components are visualized as a topographic order. This is a new principle for topographic organization, based on higher-order statistics. The model is applied to both image data and to MEG signals. Related code can be found in the 'code' section of this page.] ]

P. O. Hoyer and A. Hyvärinen
Independent Component Analysis Applied to Feature Extraction from Colour and Stereo Images
Network: Computation in Neural Systems  11(3):191-210, 2000.
[
ps | ps.gz | pdf ]
[This paper extends previous results on independent components of gray-scale image patches to (a) colour data, and (b) stereo image data. The found ICA decompositions are then compared to receptive-field properties of simple-cells in V1.]


A. Hyvärinen and P. O. Hoyer
Emergence of Phase and Shift Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces
Neural Computation  12(7):1705-1720, 2000.
[
ps | ps.gz | pdf ]
[It is well known that ICA applied on image patches gives features which resemble simple cell receptive fields. In this article, we introduce a modification of the ICA model and show how it, when estimated from image patches, leads to complex cell receptive field properties. Related code can be found in the 'code' section of this page.]

P. O. Hoyer
Independent Component Analysis in Image Denoising
Master's Thesis. April 15th, 1999.
[
ps | ps.gz | pdf ]
[This M.Sc. thesis sums up the theory of Sparse Code Shrinkage, and gives all the details of my experiments with applying it on image data. It also provides the first test of the adaptiveness of the method, by applying it on two quite different image sets and examining the differences.]

A. Hyvärinen, P. O. Hoyer and E. Oja
Sparse Code Shrinkage: Denoising by Nonlinear Maximum Likelihood Estimation
In Advances in Neural Information Processing Systems 11 (NIPS*98), pp. 473-479, MIT Press, 1999.
[
ps | ps.gz | pdf ]
[This conference paper gives the theory of the Sparse Code Shrinkage method in short (but hopefully still understandable) form. For a more detailed account of the theory, see the article in Neural Computation  (11(7):1739--1768, 1999), by A. Hyvärinen, available on-line from
his home page.]