Introduction to Machine Learning
Exam
Year | Semester | Date | Period | Language | In charge |
---|---|---|---|---|---|
2013 | autumn | 29.10-06.12. | 2-2 | English | Jyrki Kivinen |
Lectures
Time | Room | Lecturer | Date |
---|---|---|---|
Tue 10-12 | D122 | Jyrki Kivinen | 29.10.2013-06.12.2013 |
Fri 10-12 | D122 | Jyrki Kivinen | 29.10.2013-06.12.2013 |
Mon 12-14 | C220 | Jyrki Kivinen | 09.12.2013-09.12.2013 |
Exercise groups
Time | Room | Instructor | Date | Observe |
---|---|---|---|---|
Fri 14-16 | B222 | Yuan Zou | 04.11.2013—06.12.2013 |
On Tuesday 19th of November the lecture is moved to room B222!
Registration for this course starts on Tuesday October 8th at 9.00. There is additional guidance for Matlab/R on Tue October 29th at 12-14 in B221 and Fri November 1st at 12-14 in B221.
Information for international students
The course will be taught in English. All materials will appear on the English version of this page.
Announcements
- The course has been graded. The course results are available in the department intranet.
- Please fill in a feedback form for the course!
- Details of the programming assignment for separate examinations are now available in the Examinations tab.
General
The course will cover the basics of machine learning. The course consists of lectures, homework exercises and a course examination.
Machine learning emplyes a lot of concepts and techniques from mathematics. Students are expected to know the basics of probability theory, linear algebra and calculus. For this course we do not need any advanced techniques, but a general familiarity with mathematical manipulations will make the course easier.
A significant proportion of the exercises will require the use of computer to implement machine learning algorithms and experiment with them. Most students will probably find it easiest to solve these problems using Matlab, R or similar tools. During the first week, there will be some instruction in the use of such tools (see below for details). It is assumed that all students already have fairly good skills in computer programming in general.
Completing the course
There are two ways of completing the course.
- Taking the lecture course in Period II, including homework exercises and a course exam. This option is the main focus of these pages. The homework makes up 40% of the grade, the course exam 60%. In order to pass, you must score at least half the points both from homework and the exam. If you have done the homework but are unable to attend the course exam, or do not pass it, you may replace it by a separate examination.
- There will be separate examinations according to the usual policy of the department. This option requires that you additionally complete a programming project. The details of the project will be made available well in advance of the first separate exam (currently planned for 4 February 2014).
See the Examinations tab to get an idea of the type of questions in the exams. (Notice that the option of replacing this course with the similar online course offered by Coursera is no longer available.)
Literature and material
Textbook
The course textbook is Introduction to Data Mining (2005) by Tan, Steinbach and Kumar. We will mainly cover Chapters 1–5 and 8–10. More detailed pointers to the textbook will be posted here as the course progresses. However, the course does not follow the textbook precisely. Students are expected to learn both the material in the assigned parts of the textbook, and the material presented in lectures and exercises.
Lectures
Lecture notes will appear here as the course progresses. They are mainly based on material from previous instances of this course, created by Patrik Hoyer and others.
- Tutorial on multivariate distributions (P. Hoyer)
- some really quick notes about Bayes error
- Week 1: Notes for lecture 1 and lecture 2 are available. Corresponding to this, you should read Chapters 1 and 2 of the textbook.
- Week 2: Notes for lecture 3 and lecture 4 are available. Corresponding to this, you should read Sections 3.1, 3.2, 3.3, 4.1, 4.2, 5.2, 5.3.1, 5.3.2, 5.3.4, 5.7 and 5.8 of the textbook. This looks a bit fragmented because we cover topics in a different order than the textbook. Parts of this may become clearer when we get further on the course.
- Week 3: Notes for lecture 5 and lecture 6 are available now. From the textbook, you should read Sections 4.4, 4.5, 5.3.3 and 5.6.3. Pages 1–17 of Patrik Hoyer's tutorial on multivariate distributions may also be helpful.
- Week 4: Notes for lecture 7 and lecture 8 are available now. From the textbook, you should read Sections 4.3 and 5.1, and Appendix D.
- Week 5: Notes for lecture 9 and lecture 10 are available now. From the textbook, you should read Sections 5.4.0–5.4.1 and 8.0–8.3.
- Week 6: Notes for lecture 11 are available now. From the textbook, you should read Sections 9.2.2 and 8.5, and Chapter 10.
Homework exercises
There will be compulsory weekly homework consisting of both pen-and-paper and computer exercises. The exercise sessions will be held on each Friday beginning from lecture week 2, and cover mainly topics of the previous week's lectures. Attendance at the exercise sessions is voluntary, but to get credit you need to hand in our solutions following the instructions on the problem sheet. The deadline is Wednesday at 9:00am before the session.
- Exercise 1 (voluntary computer practice): problems
- Exercise 2 (deadline 6 November): problems, solutions, code
- Exercise 3 (deadline 13 November): problems, solutions, Matlab code, R code
- Exercise 4 (deadline 20 November): problems, solutions, Matlab code, R code
- Exercise 5 (deadline 27 November): problems, solutions, Matlab code, R code
- Exercise 6 (deadline 4 December): problems, solutions, Matlab code, R code
Exercise points (on department intranet, listed by student number). "LH" means pen-and-paper problems and "HT" programming problems, both with a running numbering so that, for example, "LH5" is pen-and-paper problem 2 in set 2. Column "LH15" is the extra points awarded for willingness to present solutions at the exercise sessions.
Additional material (data sets, tutorials etc.) has been collected on a separate tab.