Introduction to Machine Learning

582631
5
Algorithms and machine learning
Advanced studies
Basic concepts and methods of machine learning, in theory and in practice. Supervised learning (classification, regression) and unsupervised learning (clustering). The course serves as preparation for various courses on data analysis, machine learning and bioinformatics. Course book: Course book: An Introduction to Statistical Learning with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, Springer, 2013.

Exam

17.12.2014 09.00 B123
Year Semester Date Period Language In charge
2014 autumn 28.10-12.12. 2-2 English Jyrki Kivinen

Lectures

Time Room Lecturer Date
Tue 10-12 CK112 Jyrki Kivinen 28.10.2014-14.11.2014
Fri 10-12 CK112 Jyrki Kivinen 28.10.2014-12.12.2014
Tue 10-12 C222 Jyrki Kivinen 18.11.2014-18.11.2014
Tue 10-12 CK112 Jyrki Kivinen 24.11.2014-12.12.2014

Exercise groups

Group: 1
Time Room Instructor Date Observe
Thu 10-12 B222 Johannes Verwijnen 03.11.2014—12.12.2014

The lectures are in auditorium CK112 except on Tuesday 18th of November in C222.

Registration for this course starts on Tuesday October 7th at 9.00. There is additional guidance for Matlab/R on Tue October 28th at 12-14 in B221 and Fri October 31st at 12-14 in B221.

General

The course has ended.

In KURKI (the department's course management system), the points for homework exercises have been multiplied by 5 to simplify bookkeeping related to our late homework policy.  In the end, the points will still be scaled so they correspond to 40% of the course maximum score.

The course will cover the basics of machine learning.  The course consists of lectures, homework exercises and a course examination.

Machine learning emplyes a lot of concepts and techniques from mathematics.  Students are expected to know the basics  of probability theory, linear algebra and calculus.  For this course we do not need any advanced techniques, but a general familiarity with mathematical manipulations will make the course easier.

A significant proportion of the exercises will require the use of computer to implement machine learning algorithms and experiment with them.  Most students will probably find it easiest to solve these problems using Matlab, R or similar tools.  During the first week, there will be some instruction in the use of such tools (see below for details).  It is assumed that all students already have fairly good skills in computer programming in general.

Completing the course

There are two ways of completing the course.

  1. Taking the lecture course in Period II, including homework exercises and a course exam.  This option is the main focus of these pages.  The homework makes up 40% of the grade, the course exam 60%. In order to pass, you must score at least half the points both from homework and the exam. If you have done the homework but are unable to attend the course exam, or do not pass it, you may replace it by a separate examination.
  2. There will be separate examinations according to the usual policy of the department. This option requires that you additionally complete a programming project. The details of the project will be made available well in advance of the first separate exam.

This web page is mainly about the first option, taking the lecture course in period II, Autumn 2014.

 

Literature and material

Homework problems and related material are on a separate tab.

Textbook

The course textbook is Introduction to Data Mining (2005 or 2013 edition) by Tan, Steinbach and Kumar.  We will mainly cover Chapters 1–5 and 8–10.  More detailed pointers to the textbook will be posted here as the course progresses.  However, the course does not follow the textbook precisely.  Students are expected to learn both the material in the assigned parts of the textbook, and the material presented in lectures and exercises.

Lectures

Lecture notes will appear here as the course progresses. They are mainly based on material from previous instances of this course, created by Patrik Hoyer and others.

  • Week 1: Notes for lecture 1 and lecture 2 are available. Corresponding to this, you should read Chapters 1 and 2 of the textbook.
  • Week 2: Notes for lecture 3 and lecture 4 are available. Corresponding to this, you should read Sections 3.1, 3.2, 3.3, 4.1, 4.2, 5.2, 5.3.1, 5.3.2, 5.3.4, 5.7 and 5.8 of the textbook. This looks a bit fragmented because we cover topics in a different order than the textbook. Parts of this may become clearer when we get further on the course.
  • Week 3: Notes for lecture 5 and lecture 6 are available. From the textbook, you should read Sections 4.4, 4.5, 5.3.3 and 5.6.3. Pages 1–17 of Patrik Hoyer's tutorial on multivariate distributions may also be helpful.  (We used a large part of Lecture 5, on Tuesday 11 November, for additional discussion on Bayes error and related issues that seemed unclear to many of the students.  On Friday we continued from Naive Bayes, page 122 of lecture notes.)
  • Week 4: Notes for lecture 7 and lecture 8 are available now.  From the textbook, you should read Sections 4.3 and 5.1, and Appendix D. Lectures on Week 3 got as far as p. 140 of lecture notes. We will start Week 4 by some more discussion on overfitting and model selection.
  • Week 5: A new version of the notes for lecture 9 has been added (27 November). We will also discuss linear regression (in slide set for lecture 8) in much more detail than we had time for on week 4.  Read Section 5.4.1 from the textbook.
  • Week 6: Notes for lecture 10 and lecture 11are available now.  Read Sections 8.0–8.3 from the textbook.
  • Week 7: The last set of lecture notes includes some additional material about unsupervised learning that we will discuss on Tuesday but which will not be part of the exam. The last lecture (Friday 12 December) will be devoted to a quick summary of the course followed by discussion any questions the students may have. Old exams will be provided here to help finding questions.

IRC

During the course you can use the IRCnet channel #tkt-iml for course-related discussion. The course assistant (Johannes) will be online using the nick duvin. Some guides on IRC are available in Finnish and in English.