Introduction to Machine Learning

Basic information

Course code: 582631

Credit units: 5

Subprogramme: Algorithms and machine learning

Level: Advanced studies

Description:

Basic concepts and methods of machine learning, in theory and in practice. Supervised learning (classification, regression) and unsupervised learning (clustering). The course serves as preparation for various courses on data analysis, machine learning and bioinformatics. Course book: Course book: An Introduction to Statistical Learning with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, Springer, 2013.

Exam

11.12.2013 09.00 B123

Year	Semester	Date	Period	Language	In charge
2013	autumn	29.10-06.12.	2-2	English	Jyrki Kivinen

Lectures

Time	Room	Lecturer	Date
Tue 10-12	D122	Jyrki Kivinen	29.10.2013-06.12.2013
Fri 10-12	D122	Jyrki Kivinen	29.10.2013-06.12.2013
Mon 12-14	C220	Jyrki Kivinen	09.12.2013-09.12.2013

Exercise groups

Group: 1
Time	Room	Instructor	Date	Observe
Fri 14-16	B222	Yuan Zou	04.11.2013—06.12.2013

Huom:

On Tuesday 19th of November the lecture is moved to room B222!

Note:

Registration for this course starts on Tuesday October 8th at 9.00. There is additional guidance for Matlab/R on Tue October 29th at 12-14 in B221 and Fri November 1st at 12-14 in B221.

Information for international students

The course will be taught in English. All materials will appear on the English version of this page.

Announcements

The course has been graded. The course results are available in the department intranet.
Please fill in a feedback form for the course!
Details of the programming assignment for separate examinations are now available in the Examinations tab.

General

The course will cover the basics of machine learning. The course consists of lectures, homework exercises and a course examination.

Machine learning emplyes a lot of concepts and techniques from mathematics. Students are expected to know the basics of probability theory, linear algebra and calculus. For this course we do not need any advanced techniques, but a general familiarity with mathematical manipulations will make the course easier.

A significant proportion of the exercises will require the use of computer to implement machine learning algorithms and experiment with them. Most students will probably find it easiest to solve these problems using Matlab, R or similar tools. During the first week, there will be some instruction in the use of such tools (see below for details). It is assumed that all students already have fairly good skills in computer programming in general.

Completing the course

There are two ways of completing the course.

Taking the lecture course in Period II, including homework exercises and a course exam. This option is the main focus of these pages. The homework makes up 40% of the grade, the course exam 60%. In order to pass, you must score at least half the points both from homework and the exam. If you have done the homework but are unable to attend the course exam, or do not pass it, you may replace it by a separate examination.
There will be separate examinations according to the usual policy of the department. This option requires that you additionally complete a programming project. The details of the project will be made available well in advance of the first separate exam (currently planned for 4 February 2014).

See the Examinations tab to get an idea of the type of questions in the exams. (Notice that the option of replacing this course with the similar online course offered by Coursera is no longer available.)

Literature and material

Textbook

The course textbook is Introduction to Data Mining (2005) by Tan, Steinbach and Kumar. We will mainly cover Chapters 1–5 and 8–10. More detailed pointers to the textbook will be posted here as the course progresses. However, the course does not follow the textbook precisely. Students are expected to learn both the material in the assigned parts of the textbook, and the material presented in lectures and exercises.

Lectures

Lecture notes will appear here as the course progresses. They are mainly based on material from previous instances of this course, created by Patrik Hoyer and others.

Tutorial on multivariate distributions (P. Hoyer)
some really quick notes about Bayes error
Week 1: Notes for lecture 1 and lecture 2 are available. Corresponding to this, you should read Chapters 1 and 2 of the textbook.
Week 2: Notes for lecture 3 and lecture 4 are available. Corresponding to this, you should read Sections 3.1, 3.2, 3.3, 4.1, 4.2, 5.2, 5.3.1, 5.3.2, 5.3.4, 5.7 and 5.8 of the textbook. This looks a bit fragmented because we cover topics in a different order than the textbook. Parts of this may become clearer when we get further on the course.
Week 3: Notes for lecture 5 and lecture 6 are available now. From the textbook, you should read Sections 4.4, 4.5, 5.3.3 and 5.6.3. Pages 1–17 of Patrik Hoyer's tutorial on multivariate distributions may also be helpful.
Week 4: Notes for lecture 7 and lecture 8 are available now. From the textbook, you should read Sections 4.3 and 5.1, and Appendix D.
Week 5: Notes for lecture 9 and lecture 10 are available now. From the textbook, you should read Sections 5.4.0–5.4.1 and 8.0–8.3.
Week 6: Notes for lecture 11 are available now. From the textbook, you should read Sections 9.2.2 and 8.5, and Chapter 10.

Homework exercises

There will be compulsory weekly homework consisting of both pen-and-paper and computer exercises. The exercise sessions will be held on each Friday beginning from lecture week 2, and cover mainly topics of the previous week's lectures. Attendance at the exercise sessions is voluntary, but to get credit you need to hand in our solutions following the instructions on the problem sheet. The deadline is Wednesday at 9:00am before the session.

Exercise 1 (voluntary computer practice): problems
Exercise 2 (deadline 6 November): problems, solutions, code
Exercise 3 (deadline 13 November): problems, solutions, Matlab code, R code
Exercise 4 (deadline 20 November): problems, solutions, Matlab code, R code
Exercise 5 (deadline 27 November): problems, solutions, Matlab code, R code
Exercise 6 (deadline 4 December): problems, solutions, Matlab code, R code

Exercise points (on department intranet, listed by student number). "LH" means pen-and-paper problems and "HT" programming problems, both with a running numbering so that, for example, "LH5" is pen-and-paper problem 2 in set 2. Column "LH15" is the extra points awarded for willingness to present solutions at the exercise sessions.

Additional material (data sets, tutorials etc.) has been collected on a separate tab.

Address: Department of Computer Science, P.O. 68 (Gustaf Hällströmin katu 2b), FI-00014 UNIVERSITY OF HELSINKI, FINLAND
Opening Hours: During spring and autumn semesters Mon - Fri 7.45 - 19.45 (7.45 am - 7.45 pm)
Phone: +358 9 1911 (University switch)
General e-mail: info [at] cs.helsinki.fi
Fax: +358 9 876 4314

Department of Computer Science [pre 2018 site]

University of Helsinki

Faculty of Science

Introduction to Machine Learning

Exam

Lectures

Exercise groups

Information for international students

Announcements

General

Completing the course

Literature and material

Textbook

Lectures

Homework exercises