Introduction to Machine Learning
Koe
Vuosi  Lukukausi  Päivämäärä  Periodi  Kieli  Vastuuhenkilö 

2016  syksy  01.1116.12.  22  Englanti  Teemu Roos 
Luennot
Aika  Huone  Luennoija  Päivämäärä 

Ti 1012  CK112  Teemu Roos  01.11.201616.12.2016 
Pe 1012  CK112  Teemu Roos  01.11.201616.12.2016 
Harjoitusryhmät
Aika  Huone  Ohjaaja  Päivämäärä  Huomioitavaa 

To 1416  D122  Ville Hyvönen  07.11.2016—16.12.2016 
Aika  Huone  Ohjaaja  Päivämäärä  Huomioitavaa 

Pe 1214  B222  Ville Hyvönen  11.11.2016—16.12.2016 
Aika  Huone  Ohjaaja  Päivämäärä  Huomioitavaa 

Pe 1214  D123  Teemu Roos  07.11.2016—16.12.2016 
Registration for this course starts on Tuesday October 4th at 9.00.
Yleistä
Updates
 Jan 11: Results (requires CS department authentication) Grading criteria for the exam will be made available soon.
 Jan 10: NEW: Instructions for project (to be completed in case you are taking a separate exam) available, see below
 Dec 19: IMPORTANT: PLEASE GIVE FEEDBACK THROUGH THE ANONYMOUS FEEDBACK FORM. We want to hear from every single student who participated (in any way). Especially your written comments are useful. All feedback is welcome.
 Dec 16: Exam workshop (for you to discuss course topics and prepare for the exam on Tuesday) on Monday 19th December, 4pm, Exactum 4th floor corridor
 Dec 15: A couple of old exams from previous years are given (see below)
 Dec 9: 17:55 Problem set 6 available (see below)
 Dec 2: For next week, read Section 10 (Unsupervised learning)
 Nov 25: Friday's exercises will be held in duplicate from this week onwards: please use rooms D123 and B222 (as evenly as possible)
 Nov 23: Weekly unofficial (i.e., no teaching staff available) exercise workshops on Mondays at 4.15pm at Exactum 4th floor corridor. Please check it out!
Note: We will occasionally do online quizzes at the lectures. (They are done anonymously and they will not be graded.) Please bring a laptop or a smartphone with you to the lectures so that you can complete the quizzes.
Lectures
Lecture 1 (Nov 1): What is Machine Learning? Course logistics
 slides (pdf)  quiz
 R tutorials (see book pages 4251)
Lecture 2 (Nov 4): Evaluating performance
 slides (pdf)  quiz
 what is statistical learning, models and data, evaluating performance
 textbook pages 133
Lectures 34 (Nov 8 & 11): Linear regression & Evaluating performance II
 slides (pdf)
 linear regression, biasvariance tradeoff, overfitting, crossvalidation
 textbook pages 3342 and some bits from Sec. 3 such as Sec. 3.3.2 (but note that we only cover a small part of Sec. 3)
 see also pages 175186 (crossvalidation)
Lectures 56 (Nov 15 & 18): Classification
 slides (pdf)
 classification: logistic regression, linear and quadratic discriminant analysis (LDA and QDA), ...
 book pages 127167 (except 145149)
Lectures 78 (Nov 22 & 25): Classification II
 slides (pdf) (Tuesday: pages 15, Friday: pages 624; the rest will be covered next week)
 classification (continued): naive Bayes, kNN, decision trees
 textbook pages 303316 and 337364  and in addition, naive Bayes which is not covered in the book
Lectures 910 (Nov 29 & Dec 2): Classification III
 slides (pdf)
 classification (continued): decision trees continued, support vector machine (SVM)
 textbook Section 9
Lecture 11 (Dec 9): Clustering
 slides (pdf)
 clustering (flat and hierarchical); kmeans, agglomerative clustering
 textbook Section 10
Lecture 12 (Dec 9): Principal Component Analysis
Lecture 13 (Dec 16): Ensemble Methods
 slides (pdf)

resampling methods (bootstrap), ensemble methods (crossvalidation (revisited), bagging, random forests, ...),
examples of realworld ML
Exercises
 Problems (pdf), due Nov 1011  example solutions pdf + R scripts
 Problems (pdf), due Nov 1718  example solutions pdf + R scripts
 Problems (pdf), due Nov 2425  example solutions pdf + R scripts
 Problems (pdf), due Dec 12  example solution pdf + R scripts
 Problems (pdf), due Dec 89  example solutions pdf + R script1 script2
 Problems (pdf), due Dec 1516  example solutions pdf + R scripts
Kurssin suorittaminen
The grade will depend on exercise points (40% of the grade) and exam points (60% of the grade). For full exercise points, you need to have completed 5/6 of the available points. In addition, you must get at least half of the available exercise points, and likewise, at least half of the available exam points to pass.
The course exam is on December 20th at 8.00am. The first separate exam after the course exam will be a reexam where your exercise points are still valid. Students who would like to take a separate exam without having at least half of the exercise points will have to complete a set of small projects; see below.
NB: You are allowed to take a calculator and a "cheat sheet" with you to the exam(s). The cheat sheet is a twosided, handwritten, A4 where you can write any information whatsoever. The purpose of this is to a) make you review the course contents and write down a condensed summary of the things you may not remember otherwise, b) avoid having to memorize equations and facts that you could easily look up if you needed them in practice, and focus more on "deep learning" (pun intended), i.e., understanding rather than memorizing. You can retrieve your cheat sheet back after the exam has been marked.
Finally, please write your exam solutions in c l e a r handwriting. It's hard to grade your solution if we have to guess what it says. You can answer in Finnish, Swedish or English.
Exams
You may want to take a look at some of the exams from the last couple of years. However, be aware that the contents and the emphasis of the course has changed which will also be reflected in the exams. Some additional remarks are marked with "TR]" and yellow highlighting.
 Course exam December 20, 2016 (pdf)  grading criteria (pdf)
 Separate exam September 13, 2016 (pdf)
 Separate exam April 17, 2015 (pdf)
NEW: Completing the course by a project + separate exam
In case you didn't get minimum 50% of the available exercise points, you can complete a separate project and take a separate exam. In this case, the grade is determined 20% by the project and 80% by the exam points.

Instructions for the project (pdf)
 Newsgroup data set
 Movielens data set
 code for Jaccard coefficient: Matlab, R
Note:
 The deadline for returning the project is one week before the exam.
 When it says 'implement', it means you should write the algorithm yourself, not use an existing library that may be available in R, Python, or other environments.
Kirjallisuus ja materiaali
The course will use the following textbook:
An Introduction to Statistical Learning with Applications in R
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
Springer, 2013.
A fulltext pdf is freely available here.
Topics in the book that are not discussed in the lecture slides will not be required for the exam.
Additional Material
(not required but can be helpful/interesting)
 First and foremost: please join the Piazza Q & A forum dedicated to this course. You should have received a registration link  if not, please ask the lecturer.
 Quora: Machine Learning
 CrossValidated

probability preliminaries:
 Grinstead & Snell, Introduction to Probability (pdf)
 Todennäköisyyslaskenta II (course at the math & stats dept., in Finnish; scroll down to Kurssimoniste)

linear regression:
 Marin & Hadamani, Multiple Linear Regression in R (Youtube)
 for a derivation of least squares formula, see e.g. Sec. 3.2 "Linear Regression Models and Least Squares" in Hastie, Tibshirani & Friedman, The Elements of Statistical Learning, 2001 (see below)
 for a motivation of the least squares formula (as a maximum likelihood estimator), see e.g., "Equivalence between least squares and MLE in Gaussian model" on CrossValidated.
 principal component analysis:

bootstrap:

Jeremy Orloff and Jonathan Bloom, Bootstrap confidence intervals (pdf), MIT OpenCourseWare, Spring 2014

Jeremy Orloff and Jonathan Bloom, Bootstrap confidence intervals (pdf), MIT OpenCourseWare, Spring 2014
Additional References
 Hastie, Tibshirani & Friedman, The Elements of Statistical Learning, Springer, 2001. Fulltext pdf available for free.