Supervised Machine Learning

Algorithms and machine learning
Advanced studies
We study in particular classification from the point of view of online learning and statistical learning theory. Emphasis is on provable performance guarantees. The algorithms we study include the perceptron and its variants and the support vector machine.


26.02.2014 09.00 A111
Year Semester Date Period Language In charge
2014 spring 15.01-21.02. 3-3 English Jyrki Kivinen


Time Room Lecturer Date
Wed 10-12 B222 Jyrki Kivinen 15.01.2014-21.02.2014
Fri 10-12 B222 Jyrki Kivinen 15.01.2014-21.02.2014

Exercise groups

Group: 1
Time Room Instructor Date Observe
Thu 14-16 D122 Jyrki Kivinen 20.01.2014—21.02.2014

Information for international students

The course is taught in English.  All material will appear on the English version of this page.


The exam problems and sample solutions are available:

My apologies for the lateness of this update. If you have further questions, please make an appointment by e-mail.


The course Introduction to Machine Learning is not strictly required. However it includes a lot of very useful motivation, context and other background which we will here cover only very briefly.

The students are expected to have basic knowledge of linear algebra, probability theory and calculus. Some multivariate calculus will be needed, but we will briefly cover the necessary tools for those not familiar.

Although there are not many specific prerequisites from mathematics, the approach taken on the course is mainly mathematical, and familiarity with mathematical manipulations will help a lot.

Part of the homework will require some programming. Basic programming skills are essential, and familiarity with tools such as Matlab, Octave or R will be very helpful.



The form and content of the course will mostly be the same as in Spring 2012.

The course covers a selection of topics mainly related to (binary) classification. This is a large research area, and the choice of topics is somewhat based on the personal preferences of the lecturer (whose research area is computational learning theory, in particular online learning).

Emphasis will be on provable performance guarantees we can provide for learning algorithms.

Table of contents (preliminary):

  1. Introduction
    • basic concepts
    • mathematical frameworks for supervised learning: online and statistical
  2. Online learning
    • combining expert advice
    • linear classification and the perceptron algorithm
    • relative loss bounds (also known as regret bounds)
  3. Statistical learning
    • basic statistical model, connection to online learning
    • complexity measures: VC-dimension and Rademacher complexity
    • Support Vector Machine

Completing the course

The grade consists of homework (20%) and exam (80%).

To get credit for homework, you need to turn in your solutions on paper before the weekly exercise session. Details will be announced here soon. There will be no homework session during the first week of lectures.


About the course exam

The course exam is scheduled for Wednesday, 26 February, at 9:00. (The examination schedule contains up-to-date information about all course exams.)

The exam questions will be available in English only.  However, you may answer in English, Finnish or Swedish.

You are allowed to bring a ``cheat sheet:'' one A4-sized sheet, using both sides if you wish, with any notes about the course material that you think may be useful, in your own handwriting.  You should think of preparing your ``cheat sheet'' as part of revising the material for the exam.  Of course, if you don't think you need one, you don't have to make one.

You will not need a calculator.

To help in preparing for the exam, you may wish to have a look at some previous exams:

The exam will in principle cover everything discussed in the lectures, lecture notes and homework. However, there will be no problems about Rademacher complexity (pages 170–198 of lecture notes), since we did not have time to cover it properly in class. Notice however that all other material covered during the last week of lectures is part of the exam area, even though we did not have any homework based on it.

Typical kinds of exam questions include (but are not limited to)

  • define some mathematical concept, explain how it is used in machine learning
  • explain some algorithm
  • give a proof for some theorem from the lecture notes
  • mathematical problems similar to homework, or actual homework problems.

As usual, some questions will be easier, some more difficult.

Literature and material

The material consists of lecture notes, exercise problems and their solutions, and possibly some additional original articles.  The lecture notes and homework material will appear here as the course progresses.  Additional reading material has been added on a separate tab.


Lecture notes: pages 1–204 are available now.

Additional note on KKT conditions (related to the lecture of Friday 14 February)

Corrections to lecture notes:

  • page 43: included the density of uniform distribution in the integral (23.1.)
  • page 129: fixed a missing "^2" in definition of l^2 (18.2.)
  • page 152: added missing minus sign to definition of W(alpha) (19.2.)

Progress of lectures:

  • Wednesday 15 January: practicalities; background and basic concepts; basics of online learning, Halving Algorithm (pp. 1–28)
  • Friday 17 January: Weighted Majority, online mistake bounds (pp. 29–37)
  • Wednesday 22 January: Weighted Majority vs. Bayes; statistical learning, noise-free PAC model (pp. 38–50)
  • Friday 24 January: agnostic PAC model; the expert framework for online learning (pp. 51–65)
  • Wednesday 29 January: Aggregating Algorithm for absolute loss (pp. 66–78)
  • Friday 31 January: parameter tuning; Perceptron convergence theorem (pp. 79–90, 93–95)
  • Wednesday 5 February: applications of Perceptron convergence theorem, feature mappings (pp. 91–92, 96–102)
  • Friday 7 February: kernel trick, kernelised Perceptron; hinge loss (pp. 103–117)
  • Wednesday 12 February: bounds for the non-separable case; introduction  to Support Vector Machine (pp. 118–132)
  • Friday 14 February: introduction to convex optimisation; basic idea of soft margin SVM (pp. 133–145; 155)
  • Wednesday 19 February: primal and dual formulations for hard and soft margin SVM (pp. 146–162)
  • Friday 21 February: Vapnik-Chervonenkis dimension (pp. 163–169), Rademacher complexity (pp. 170–198); summary of the course.  Note: Rademacher complexity was covered only in a very cursory manner and will not be part of the exam.


  1. Thursday, 23 January: problems (available also as LaTeX source); solutions; some Matlab code for problem 4
  2. Thursday, 30 January: problems (available also as LaTeX source); solutions (updated 12 February: fixed some mistakes in the numerical calculations in problem 4)
  3. Thursday, 6 February: problems (available also as LaTeX source); solutions
  4. Thursday, 13 February: problems (available also as LaTeX source); solutions
  5. Thursday, 20 February: problems (available also as LaTeX source); data; solutions; some Matlab code for problem 4

Homework points: Includes points for all homework solutions and the extra points for the sessions (column 6). There were five sessions, each with four regular problems graded 0 to 3, which gives a maximum of 5*4*3 = 60 points for the regular homework, but because of various bonus possibilities, some students actually receive more than this.