Unsupervised Machine Learning

Basic information

Course code: 582638

Credit units: 5

Subprogramme: Algorithms and machine learning

Level: Advanced studies

Description:

Unsupervised learning is one of the main streams of machine learning, and closely related to multivariate statistics and data mining. This course describes some of the main methods in unsupervised learning, such as principal and independent component analysis, clustering, and nonlinear dimension reduction methods. In recent years, machine learning has become heavily dependent on statistical theory which is why this course is somewhere on the borderline between statistics and computer science. Emphasis is put both on the statistical/probabilistic formulation of the methods as well as on their computational implementation. The course is intended to CS students in the algorithms and machine learning specialisation, to statistics students, and to mathematics students in the statistical machine learning specialisation.

Exam

07.05.2015 16.00 A111

Year	Semester	Date	Period	Language	In charge
2015	spring	10.03-30.04.	4-4	English	Aapo Hyvärinen

Lectures

Time	Room	Lecturer	Date
Tue 14-16	C222	Aapo Hyvärinen	10.03.2015-30.04.2015
Thu 14-16	C222	Aapo Hyvärinen	10.03.2015-30.04.2015
Fri 14-16	C222	Aapo Hyvärinen	10.03.2015-30.04.2015

Information for international students

The course will be completely in English.

General

Please note: This may be the last time this course is given --- take it this spring or never!

Target audience

Master's students in computer science (specialization in algorithms, data analytics & machine learning, or bioinformatics), applied mathematics (specialization statistical machine learning or e.g. stochastics), or statistics.

Description

Unsupervised learning is one of the main streams of machine learning, and closely related to exploratory data analysis and data mining. This course describes some of the main methods in unsupervised learning.

In recent years, machine learning has become heavily dependent on statistical theory which is why this course is somewhere on the borderline between statistics and computer science. Emphasis is put both on the statistical formulation of the methods as well as on their computational implementation.

The goal is not only to introduce the methods on a theoretical level but also to show how they can be implemented in scientific computing environments. Computer projects are thus an important part of the course, but they are given separate credits, see Projects for Unsupervised Machine Learning. The projects will be explained in a session marked below in the schedule.

Exercices are given and coordinated by Jouni Puuronen, puuronen at mappi.helsinki.fi .

Timetable

One of the weekly sessions (Friday) will be an exercice session, the timetable is as follows:

Tue 10 Mar	Lecture	Thu 12 Mar	Lecture	Fri 13 Mar	Lecture
Tue 17 Mar	Lecture	Thu 19 Mar	Lecture	Fri 20 Mar	Exercices
Tue 24 Mar	Lecture	Thu 26 Mar	Lecture	Fri 27 Mar	Exercices
Tue 31 Mar	Lecture	Thu 2 Apr	Holiday (Easter)	Fri 3 Apr	Holiday (Easter)
Tue 7 Apr	Holiday (Easter)	Thu 9 Apr	Intro to projects	Fri 10 Apr	Exercices
Tue 14 Apr	Lecture	Thu 16 Apr	Lecture	Fri 17 Apr	Exercices
Tue 21 Apr	Lecture	Thu 22 Apr	Lecture	Fri 23 Apr	Exercices
Tue 28 Apr	Lecture	Thu 30 Apr	Exercices	Fri 1 May	Holiday (Vappu)

Prerequisites

Computer science majors: Bachelor's degree strongly recommended. It should include the following mathematics courses: calculus (including vector calculus), linear algebra I&II, introduction to probability. You should also have done "Introduction to machine learning" in Period II.
Statistics majors: Bachelor's degree recommended.
Mathematics majors: Bachelor's degree recommended. It should include basic courses in calculus (including vector calculus), linear algebra I&II, introduction to probability, introduction to statistical inference.

Introduction. Supervised vs. unsupervised learning. Applications of unsupervised learning. Probabilistic formulation. Review of some basic mathematics (linear algebra, probability)
Numerical optimization. Gradient method, Newton's method, stochastic gradient, projected gradient methods
Principal component analysis and factor analysis. Formulation as minimization of reconstruction error or maximization of component variance. Computation using covariance matrix and its eigen-value decomposition. Factor analysis and interpretation of PCA as estimation of gaussian generative model. Factor rotations.
Independent component analysis. Problem of blind source separation, why non-gaussianity is needed for identifiability. Correlation vs. independence. ICA as maximization of non-gaussianity, measurement of non-Gaussianity by cumulants. Likelihood of the model and maximum likelihood estimation. Information-theoretic approach. Implementation by gradient methods and FastICA. Applications of component analysis.
Sparse coding and dictionary learning. Formulation as ICA with too many components. Olshausen-Field model.
Clustering. K-means algorithm.Gaussian mixture model: Maximization of likelihood, EM algorithm.
Nonlinear dimension reduction. Non-metric multi-dimensional scaling and related methods: kernel PCA, Laplacian eigenmaps, IsoMap. Kohonen's self-organizing map.

Completing the course

There will be a single exam at the end of the course (with renewal exams and separate exams according to usual departmental standards). Check the exact timetable and place on the CS dept exam page.

Active participation in the exercise sessions will give you points for the exam, details are this directory.

Literature and material

You can now download here the complete lecture notes for this year's course. Just to keep search engines away, you (will) need the login uml and password uml. There is no book for the course.

Exercises will be made available, session by session, in this directory, which also contains detailed information on how exercise sessions work and how you can get extra points for the exam.

Address: Department of Computer Science, P.O. 68 (Gustaf Hällströmin katu 2b), FI-00014 UNIVERSITY OF HELSINKI, FINLAND
Opening Hours: During spring and autumn semesters Mon - Fri 7.45 - 19.45 (7.45 am - 7.45 pm)
Phone: +358 9 1911 (University switch)
General e-mail: info [at] cs.helsinki.fi
Fax: +358 9 876 4314

Department of Computer Science [pre 2018 site]

University of Helsinki

Faculty of Science