Special Course in Unsupervised Machine Learning: Probabilistic Factor Analysis M

582762
2-3
Algorithms and machine learning
Advanced studies
This course will discuss probabilistic factor analysis methods within the domain of unsupervised machine learning. Factor analysis approaches are characterized by their ability to learn representations that summarize the data and are, therefore, widely used in data analysis and research. The course will cover factorization methods for matrices and tensors (higher-order matrices of three or more modes), as well as factorizations for multiple joint matrices and tensors. The methods will be introduced with their theoretical and statistical basis, while emphasis will also be laid on their computational implementation and interpretability in practical applications.
Year Semester Date Period Language In charge
2016 spring 17.05-19.05. 4-4 English Suleiman Khan

Lectures

Time Room Lecturer Date
Tue 12-16 C222 Suleiman Khan 17.05.2016-19.05.2016
Wed 12-16 C222 Suleiman Khan 17.05.2016-19.05.2016
Thu 12-16 C222 Suleiman Khan 17.05.2016-19.05.2016

Exercise groups

Group: 1
Time Room Instructor Date Observe
Wed 10-12 B221 Joseph Sakaya 18.05.2016—19.05.2016
Thu 10-12 B221 Joseph Sakaya 18.05.2016—19.05.2016

General

Description:

This course will discuss probabilistic factor analysis methods within the domain of unsupervised machine learning. Factor analysis approaches are characterized by their ability to learn representations that summarize the data and are, therefore, widely used in data analysis and research. The course will cover factorization methods for matrices and tensors (higher-order matrices of three or more modes), as well as factorizations for multiple joint matrices and tensors.

 

The methods will be introduced with their theoretical and statistical basis, while emphasis will also be laid on their computational implementation and interpretability in practical applications. Exercise sessions will introduce STAN programming interface to the students.

 

Language: English

 

Target Audience:

Master’s students in computer science, machine learning, statistics, data analytics or bioinformatics.

 

Contents:

* Introduction to machine learning and unsupervised learning.

* Matrix factorization: Factor Analysis (FA) and Principle Component Analysis (PCA), estimation as Gaussian generative model, interpretation and example application.

* Tensor factorization: Canonical decomposition (parallel factors; CP) and Tucker(1,2,3) factorizations, estimation of the models, interpretation and example application.

* Multiple Matrix-Tensor factorization: Canonical Correlation Analysis (CCA) and recent developments, principles and methods, model formulation and applications.

 

Credits: 2 + 1

 

Lecturer: Suleiman Ali Khan (D.Sc. Tech.) <suleiman.khan[at]helsinki.fi>

Guest Lecturer Ali Faisal: ali.faisal[at]opuscapita.com

Course Assistant: Joseph H Sakaya <joseph.sakaya[at]helsinki.fi>

Evaluation: Attendance to all sessions is mandatory. In addition, students will be able to participate in an optional project for an additional credit.

 

Time Table

Lectures: May 17th -19th, 12:30 – 16:00

Exercises: May 18th – 19th, 10:00 – 11:45

 

Contents

1. Introduction to Unsupervised Machine Learning

a. Machine Learning, Unsupervised Machine Learning and Latent variables

2. Matrix factorization

a. Geometrical and mathematical intuition

b. Factor analysis and PCA

i. Formulation as a generative model

ii. Factor rotation problem and classic methods

iii. Application issues (Interpretation of factors and loadings, scaling, normalization and rank selection)

iv. Example application.

3. Tensor factorization

a. Canonical decomposition (parallel factors, CP)

i. Intuition and formulation as a generative model

ii. Uniqueness and Degeneracy problems

iii. Application issues (Interpretation of factors and loadings, different scaling and normalization, NP-hard rank selection)

iv. Example application.

b. Tucker(1,2,3) factorizations

i. Intuition and model formulation

ii. Factor rotation problem and classic solutions

iii. Application issues

1. Dimensionality reduction, structure identification

2. Interpretation of factors and loadings

3. Mode specific ranks

iv. Example application.

4. Multiple Matrix and Tensor factorization

a. Multiple matrix factorizations

i. Canonical Correlation Analysis (CCA)

1. Principles, method, application.

ii. Group Factor Analysis (GFA)

1. Model formulation, assumptions, interpretation, issues and application.

b. Multi-Tensor factorizations

i. Methods, model formulation, problems and applications.