Introduction to Machine Learning

582631
5
Algorithms and machine learning
Advanced studies
Basic concepts and methods of machine learning, in theory and in practice. Supervised learning (classification, regression) and unsupervised learning (clustering). The course serves as preparation for various courses on data analysis, machine learning and bioinformatics. Course book: Course book: An Introduction to Statistical Learning with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, Springer, 2013.

Exam

20.12.2016 08.00 B123, CK112
Year Semester Date Period Language In charge
2016 autumn 01.11-16.12. 2-2 English Teemu Roos

Lectures

Time Room Lecturer Date
Tue 10-12 CK112 Teemu Roos 01.11.2016-16.12.2016
Fri 10-12 CK112 Teemu Roos 01.11.2016-16.12.2016

Exercise groups

Group: 1
Time Room Instructor Date Observe
Thu 14-16 D122 Ville Hyvönen 07.11.2016—16.12.2016
Group: 2
Time Room Instructor Date Observe
Fri 12-14 B222 Ville Hyvönen 11.11.2016—16.12.2016
Group: 3
Time Room Instructor Date Observe
Fri 12-14 D123 Teemu Roos 07.11.2016—16.12.2016

Registration for this course starts on Tuesday October 4th at 9.00.

General

Updates

  • Jan 11: Results (requires CS department authentication) Grading criteria for the exam will be made available soon.
  • Jan 10: NEW: Instructions for project (to be completed in case you are taking a separate exam) available, see below
  • Dec 19: IMPORTANT: PLEASE GIVE FEEDBACK THROUGH THE ANONYMOUS FEEDBACK FORM. We want to hear from every single student who participated (in any way). Especially your written comments are useful. All feedback is welcome.
  • Dec 16: Exam workshop (for you to discuss course topics and prepare for the exam on Tuesday) on Monday 19th December, 4pm, Exactum 4th floor corridor
  • Dec 15: A couple of old exams from previous years are given (see below)
  • Dec 9: 17:55 Problem set 6 available (see below)
  • Dec 2: For next week, read Section 10 (Unsupervised learning)
  • Nov 25: Friday's exercises will be held in duplicate from this week onwards: please use rooms D123 and B222 (as evenly as possible)
  • Nov 23: Weekly unofficial (i.e., no teaching staff available) exercise workshops on Mondays at 4.15pm at Exactum 4th floor corridor. Please check it out!

Note: We will occasionally do online quizzes at the lectures. (They are done anonymously and they will not be graded.) Please bring a laptop or a smartphone with you to the lectures so that you can complete the quizzes.

 

Lectures
 

Lecture 1 (Nov 1): What is Machine Learning? Course logistics
Lecture 2 (Nov 4): Evaluating performance
  • slides (pdf) | quiz
  • what is statistical learning, models and data, evaluating performance
  • textbook pages 1-33
Lectures 3-4 (Nov 8 & 11): Linear regression & Evaluating performance II
  • slides (pdf)
  • linear regression, bias-variance tradeoff, overfitting, cross-validation
  • textbook pages 33-42 and some bits from Sec. 3 such as Sec. 3.3.2 (but note that we only cover a small part of Sec. 3)
  • see also pages 175-186 (cross-validation)
Lectures 5-6 (Nov 15 & 18): Classification
  • slides (pdf)
  • classification: logistic regression, linear and quadratic discriminant analysis (LDA and QDA), ...
  • book pages 127-167 (except 145-149)
Lectures 7-8 (Nov 22 & 25): Classification II
  • slides (pdf) (Tuesday: pages 1-5, Friday: pages 6-24; the rest will be covered next week)
  • classification (continued): naive Bayes, k-NN, decision trees
  • textbook pages 303-316 and 337-364 -- and in addition, naive Bayes which is not covered in the book
Lectures 9-10 (Nov 29 & Dec 2): Classification III
  • slides (pdf)
  • classification (continued): decision trees continued, support vector machine (SVM)
  • textbook Section 9
Lecture 11 (Dec 9): Clustering
  • slides (pdf)
  • clustering (flat and hierarchical); k-means, agglomerative clustering
  • textbook Section 10
Lecture 12 (Dec 13): Principal Component Analysis
Lecture 13 (Dec 16): Ensemble Methods
  • slides (pdf)
  • resampling methods (bootstrap), ensemble methods (cross-validation (revisited), bagging, random forests, ...),
    examples of real-world ML


Exercises

  1. Problems (pdf), due Nov 10-11 | example solutions pdf + R scripts
  2. Problems (pdf), due Nov 17-18 | example solutions pdf + R scripts
  3. Problems (pdf), due Nov 24-25 | example solutions pdf + R scripts
  4. Problems (pdf), due Dec 1-2 | example solution pdf + R scripts
  5. Problems (pdf), due Dec 8-9 | example solutions pdf + R scripts
  6. Problems (pdf), due Dec 15-16 | example solutions pdf + R scripts

Completing the course

The grade will depend on exercise points (40% of the grade) and exam points (60% of the grade). For full exercise points, you need to have completed 5/6 of the available points. In addition, you must get at least half of the available exercise points, and likewise, at least half of the available exam points to pass.

The course exam is on December 20th at 8.00am. The first separate exam after the course exam will be a re-exam where your exercise points are still valid. Students who would like to take a separate exam without having at least half of the exercise points will have to complete a set of small projects; see below.

NB: You are allowed to take a calculator and a  "cheat sheet" with you to the exam(s). The cheat sheet is a two-sided, handwritten, A4 where you can write any information whatsoever. The purpose of this is to a) make you review the course contents and write down a condensed summary of the things you may not remember otherwise, b) avoid having to memorize equations and facts that you could easily look up if you needed them in practice, and focus more on "deep learning" (pun intended), i.e., understanding rather than memorizing. You can retrieve your cheat sheet back after the exam has been marked.

Finally, please write your exam solutions in  c l e a r  handwriting. It's hard to grade your solution if we have to guess what it says. You can answer in Finnish, Swedish or English.

 

Exams

You may want to take a look at some of the exams from the last couple of years. However, be aware that the contents and the emphasis of the course has changed which will also be reflected in the exams. Some additional remarks are marked with "|TR]" and yellow highlighting.


NEW: Completing the course by a project + separate exam

In case you didn't get minimum 50% of the available exercise points, you can complete a separate project and take a separate exam. In this case, the grade is determined 20% by the project and 80% by the exam points.

​Note: 

  1. The deadline for returning the project is one week before the exam.
  2. When it says 'implement', it means you should write the algorithm yourself, not use an existing library that may be available in R, Python, or other environments.

 

Literature and material

The course will use the following textbook:

An Introduction to Statistical Learning with Applications in R
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
Springer, 2013.

A fulltext pdf is freely available here.

Topics in the book that are not discussed in the lecture slides will not be required for the exam.
 

Additional Material

(not required but can be helpful/interesting)

Additional References

  • Hastie, Tibshirani & Friedman, The Elements of Statistical Learning, Springer, 2001. Fulltext pdf available for free.