# Seminar in Probabilistic Models for Big Data

Year | Semester | Date | Period | Language | In charge |
---|---|---|---|---|---|

2014 | autumn | 03.09-10.12. | 1-2 | English | Antti Honkela |

## Lectures

Time | Room | Lecturer | Date |
---|---|---|---|

Wed 12-14 | C220 | Antti Honkela | 03.09.2014-15.10.2014 |

Wed 12-14 | C220 | Antti Honkela | 29.10.2014-10.12.2014 |

## General

#### Instructors:

Dr Antti Honkela, Academy Research Fellow

Dr Arto Klami, Academy Research Fellow

#### Introduction:

Probabilistic models, such as Gaussian processes, hidden-Markov models, and topic models, are popular tools for data analysis and machine learning. At the core of all of these methods lies the task of posterior inference: Given some observed data we need to learn the posterior distribution of the parameters that describe the data. Even though this step is in principle straightforward, many of the standard inference algorithms are computationally heavy and hence can only be used for applications with limited amount of data.

Many of the interesting applications, however, produce massive amounts of data. The standard Bayesian techniques are not applicable for analysis of collections of hundreds of millions text documents or for modeling very large neuroimaging data sets. There is clear demand for justified probabilistic inference also for these applications, and merely scaling up the computational resources is not enough.

This seminar looks at the recent advances in probabilistic modeling for big data, seeking answers to the question of whether Bayesian inference can be scaled up for massive data and how it should be done. Instead of discussing scalable implementations (parallel processing etc.), we will study how the actual inference algorithms can be improved to work with large data collections. The focus is in recent theoretical advances in efficient inference algorithms, including topics such as stochastic variational inference, stochastic gradient Monte Carlo methods, and other accelerated gradient algorithms.

#### Prerequisites:

Participants must have completed the courses “Scientific writing” and “Probabilistic models”, or have demonstrated equivalent knowledge. In particular, the participants should be familiar with the basic concepts of probabilistic modeling and Bayesian inference, especially practical inference methods such as Markov chain Monte Carlo methods and variational approximations.

## Completing the course

Each participant will study in detail 1-3 research or review articles about the topic (article suggestions will be provided), write a 4-6 page summary of the article(s), and give an oral presentation (in Period II). In addition, everyone will review the summaries of (some) other participants and act as an opponent for the oral presentations.

The preliminary schedule of the course is:

- September 3rd: Introductory lecture
- September 10th: Choice of topics
- October 19th: Deadline for the article summaries
- November 5th: Deadline for reviews of the article summaries
- November 25th: Deadline for final versions of the article summaries
- The presesentations will be given during Period II, roughly between early November and mid December. The exact dates will be determined later.

## Literature and material

#### Slides for the introductory lecture

The introductory lecture slides

#### Background reading

These papers mostly cover the required background.

- Z. Ghahramani (2004).

Unsupervised Learning.

In Bousquet, O., Raetsch, G. and von Luxburg, U. (eds) Advanced Lectures on Machine Learning LNAI 3176. Springer-Verlag.

Especially sections 1-6, 10-11.4. - C. Andrieu, N. de Freitas, A. Doucet, M. I. Jordan (2003).

An Introduction to MCMC for Machine Learning.

Machine Learning 50:5-43.

Especially sections 2, 2.1, 3.1, 3.4. - C. W. Fox, S. J. Roberts (2012).

A tutorial on variational Bayesian inference.

Artif Intell Rev 38:85-95.

Especially sections 1.1, 1.3-1.5.

#### Presentation topics

The presentations will be about recent scientific articles, chosen primarily from the categorized list below.

##### MCMC

M1. S. Ahn, A. Korattikara, M. Welling, Bayesian posterior sampling via stochastic gradient Fisher scoring, ICML 2012 and M. Welling, Y.W.Teh, Bayesian Learning via Stochastic Gradient Langevin Dynamics, ICML 2011.

M2. S.Patterson, Y. W. Teh, Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex, NIPS 2013.

M3. S. Ahn, B. Shahbaba, M. Welling, Distributed stochastic gradient MCMC, ICML 2014.

M4. T. Chen, E. Fox, C. Guestrin, Stochastic Gradient Hamiltonian Monte Carlo, ICML 2014.

M5. A. Korattikara, Y. Chen, M. Welling, Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget, ICML 2014.

M6. D. Maclaurin, R.P. Adams, Firefly Monte Carlo: Exact MCMC with Subsets of Data, UAI 2014.

M7. W. Neiswanger, E. Xing, C. Wang, Asymptotically Exact, Embarrassingly Parallel MCMC, UAI 2014, S.L. Scott, A.W. Blocker, F.V. Bonassi, H.A. Chipman, E.I. George, R.E. McCulloch, Bayes and Big Data: The Consensus Monte Carlo Algorithm, Bayes 250, 2013, and T. Campbell, J. How, Approximate Decentralized Bayesian Inference, UAI 2014.

##### Variational inference

V1. M.D. Hoffman, D.M. Blei, C. Wang, J. Paisley. Stochastic Variational Inference, JMLR 2013 (Sections 1-2 and 5)

V2. M.D. Hoffman, D.M. Blei, C. Wang, J. Paisley. Stochastic Variational Inference, JMLR 2013 (Sections 3-4) and R. Ranganath, C. Wang, D.M. Blei, E. Xing, An Adaptive Learning Rate for Stochastic Variational Inference, ICML 2013.

V3. M. Titsias, M. Lazaro-Gredilla, Doubly Stochastic Variational Bayes for non-Conjugate Inference, ICML 2014.

V4. D.J. Rezende, S. Mohamed, D. Wierstra, Stochastic Backpropagation and Approximate Inference in Deep Generative Models, ICML 2014.

V5. J. Hensman, N. Fusi, N.D. Lawrence, Gaussian Processes for Big Data, UAI 2013.

V6. J. M. Hernandez-Lobato, N. Houlsby, Z. Ghahramani, Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices, ICML 2014.

##### Other

O1. H. Rue, S. Martino, Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations, JRSS:B, 2009.

O2. M. Schmidt, N. Le Roux, F. Bach, Minimizing Finite Sums with the Stochastic Average Gradient, arXiv 2013.