Seminar in Probabilistic Models for Big Data

Algorithms and machine learning
Advanced studies
Probabilistic models are popular tools for data analysis and machine learning. Many of the standard inference algorithms are, however, computationally heavy and can only be used for small-scale applications with limited amount of data. This seminar is about efficient alternatives that can be used also for big data applications. The focus is in recent theoretical advances in efficient inference applicable to a variety of models, including topics such as stochastic variational inference, stochastic gradient Monte Carlo methods, and other accelerated gradient algorithms.
Year Semester Date Period Language In charge
2014 autumn 03.09-10.12. 1-2 English Antti Honkela


Time Room Lecturer Date
Wed 12-14 C220 Antti Honkela 03.09.2014-15.10.2014
Wed 12-14 C220 Antti Honkela 29.10.2014-10.12.2014



Dr Antti Honkela, Academy Research Fellow
Dr Arto Klami, Academy Research Fellow


Probabilistic models, such as Gaussian processes, hidden-Markov models, and topic models, are popular tools for data analysis and machine learning. At the core of all of these methods lies the task of posterior inference: Given some observed data we need to learn the posterior distribution of the parameters that describe the data. Even though this step is in principle straightforward, many of the standard inference algorithms are computationally heavy and hence can only be used for applications with limited amount of data.

Many of the interesting applications, however, produce massive amounts of data. The standard Bayesian techniques are not applicable for analysis of collections of hundreds of millions text documents or for modeling very large neuroimaging data sets. There is clear demand for justified probabilistic inference also for these applications, and merely scaling up the computational resources is not enough.

This seminar looks at the recent advances in probabilistic modeling for big data, seeking answers to the question of whether Bayesian inference can be scaled up for massive data and how it should be done. Instead of discussing scalable implementations (parallel processing etc.), we will study how the actual inference algorithms can be improved to work with large data collections. The focus is in recent theoretical advances in efficient inference algorithms, including topics such as stochastic variational inference, stochastic gradient Monte Carlo methods, and other accelerated gradient algorithms.


Participants must have completed the courses “Scientific writing” and “Probabilistic models”, or have demonstrated equivalent knowledge. In particular, the participants should be familiar with the basic concepts of probabilistic modeling and Bayesian inference, especially practical inference methods such as Markov chain Monte Carlo methods and variational approximations.

Completing the course

Each participant will study in detail 1-3 research or review articles about the topic (article suggestions will be provided), write a 4-6 page summary of the article(s), and give an oral presentation (in Period II). In addition, everyone will review the summaries of (some) other participants and act as an opponent for the oral presentations.

The preliminary schedule of the course is:

  • September 3rd: Introductory lecture
  • September 10th: Choice of  topics
  • October 19th: Deadline for the article summaries
  • November 5th: Deadline for reviews of the article summaries
  • November 25th: Deadline for final versions of the article summaries 
  • The presesentations will be given during Period II, roughly between early November and mid December. The exact dates will be determined later.

Literature and material

Slides for the introductory lecture

The introductory lecture slides

Background reading

These papers mostly cover the required background.

Presentation topics

The presentations will be about recent scientific articles, chosen primarily from the categorized list below.


M1. S. Ahn, A. Korattikara, M. Welling, Bayesian posterior sampling via stochastic gradient Fisher scoringICML 2012 and M. Welling, Y.W.Teh, Bayesian Learning via Stochastic Gradient Langevin Dynamics, ICML 2011.

M2. S.Patterson, Y. W. Teh, Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex, NIPS 2013.

M3. S. Ahn, B. Shahbaba, M. Welling, Distributed stochastic gradient MCMC, ICML 2014.

M4. T. Chen, E. Fox, C. Guestrin, Stochastic Gradient Hamiltonian Monte Carlo, ICML 2014.

M5. A. Korattikara, Y. Chen, M. Welling, Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget, ICML 2014.

M6. D. Maclaurin, R.P. Adams, Firefly Monte Carlo: Exact MCMC with Subsets of Data, UAI 2014.

M7. W. Neiswanger, E. Xing, C. Wang, Asymptotically Exact, Embarrassingly Parallel MCMC, UAI 2014, S.L. Scott, A.W. Blocker, F.V. Bonassi, H.A. Chipman, E.I. George, R.E. McCulloch, Bayes and Big Data: The Consensus Monte Carlo Algorithm, Bayes 250, 2013, and T. Campbell, J. How, Approximate Decentralized Bayesian Inference, UAI 2014. 

Variational inference

V1. M.D. Hoffman, D.M. Blei, C. Wang, J. Paisley. Stochastic Variational Inference, JMLR 2013 (Sections 1-2 and 5)

V2. M.D. Hoffman, D.M. Blei, C. Wang, J. Paisley. Stochastic Variational Inference, JMLR 2013 (Sections 3-4) and R. Ranganath, C. Wang, D.M. Blei, E. Xing, An Adaptive Learning Rate for Stochastic Variational Inference, ICML 2013.

V3. M. Titsias, M. Lazaro-Gredilla, Doubly Stochastic Variational Bayes for non-Conjugate Inference, ICML 2014.

V4. D.J. Rezende, S. Mohamed, D. Wierstra, Stochastic Backpropagation and Approximate Inference in Deep Generative Models, ICML 2014.

V5. J. Hensman, N. Fusi, N.D. Lawrence, Gaussian Processes for Big Data, UAI 2013.

V6. J. M. Hernandez-Lobato, N. Houlsby, Z. Ghahramani, Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices, ICML 2014.


O1. H. Rue, S. Martino, Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations, JRSS:B, 2009.

O2. M. Schmidt, N. Le Roux, F. Bach, Minimizing Finite Sums with the Stochastic Average Gradient, arXiv 2013.