University of Helsinki Department of Computer Science

Department of Computer Science

Department information


58147-9 Machine Learning, Spring 2005

The course is over, but you can still give feedback.


The course consists of traditional lectures, exercice sessions and an examination. Some of the exercises include simple computer experiments, but the main focus in on developing the theory.

Jyrki Kivinen 25.1.-13.4. Tue 16-18, Wed 14-16 D122
Exercise sessions
Jussi Lindgren 31.1.-22.4. Wed 16-18 B119
Fri 29.4. at 14-18 A111

Course material

Course description (in Finnish)

There is no textbook. The examination will be based on material that has been covered in the lectures and exercises. Some suggestions for additional reading are at the end of this page.

Lecture notes: PostScript four slides per sheet, for printing; PDF for viewing on screen. A paper version is available in the course folder in room C127 for copying. (Currently Chapter 1 is still in Finnish, the rest in English; sorry for the confusion.)


  1. homework 2.2.: problems PS, PDF, solutions.
  2. homework 9.2.: problems PS, PDF, solutions.
  3. homework 16.2.: problems PS, PDF; solutions.
  4. homework 23.2.: problems PS, PDF; solutions.
  5. homework 2.3.: problems PS, PDF; solutions
    (the URL's in the paper have a typo: "jtlindgre" should be "jtlindgr")
    data Matlab, R
  6. homework 9.3.: problems PS, PDF; solutions
  7. homework 16.3.: problems PS, PDF; solutions
  8. homework 23.3.: problems PS, PDF; solutions
  9. homework 6.4.: problems PS, PDF; solutions
  10. homework 13.4.: problems PS, PDF; solutions
    OSU SVM Toolbox
    Artificial data for Problem 4
  11. homework 20.4.: problems PS, PDF; solutions
All solutions in one file

Additional reading material

This listing is given for those who find the quality of presentation in the lecture notes inadequate and/or wish to know more. They are not part of the course requirements.

General Textbooks

The recommended general presentation of machine learning is

Tom Mitchell: Machine Learning, McGraw Hill 1997.
The standard textbook for computational learning theory is
Michael J. Kearns and Umesh V. Vazirani: An Introduction to Computational Learning Theory, MIT Press 1994
but it has somewhat different bias than our course.

There are various books on support vector machines and related techniques, among which

John Shawe-Taylor and Nello Cristianini: Kernel Methods for Pattern Analysis, Cambridge University Press 2004
is very recent and similar in spirit to our course. The earlier book by the same authors, An Introduction to Support Vector Machines, is also good.

Online learning

Remark: Our reading list for online learning is rather diverse. This reflects the fact that online learning is not as established as the other topics of the course. There are no text books and few survey articles. Also, this happens to be the personal favourite of the lecturer. For the other parts of the course, we can hopefully collect a more compact list of reference articles.

The Weighted Majority algorithm is analysed in

N. Littlestone and M. K. Warmuth: The weighted majority algorithm, Information and Computation 108(2):212-261, February 1994
and the Weighted Average in
Jyrki Kivinen and Manfred K. Warmuth: Averaging expert predictions, in Proc. 4th European Conference on Computational Learning Theory, pages 153-167, Springer LNAI 1572, Berlin, March 1999.
The Aggregating Algorithm is due to Vovk. Perhaps the most accessible of his articles on this topic is
V. Vovk: A game of prediction with expert advice, Journal of Computer and System Sciences 56(2):153-173, April 1998.
However, it might be easier to start from
Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire and Manfred K. Warmuth: How to use expert advice, Journal of the ACM 44(3):427 - 485, May 1997,
which gives a detailed description of the absolute loss case (including the doubling trick) or
David Haussler, Jyrki Kivinen and Manfred K. Warmuth: Sequential prediction of individual sequences under general loss functions, IEEE Transactions on Information Theory 44(5):1906-1925, September 1998.
that covers the "nice" loss functions (square, log etc.). The tutorial
Jyrki Kivinen: Online learning of linear classifiers, In S. Mendelson and A. J. Smola, editors, Advanced Lectures on Machine Learning, pages 235-257, Springer LNCS 2600, Berlin, January 2003
covers a lot of the recent theory of online linear classification, including p-norm Perceptrons, the Marginalised Perceptron and more. The examples of kernels are taken from Chapter 9 of the book by Shawe-Taylor and Cristianini (see above). For the original Winnow algorithm, see
Nick Littlestone: Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm, Machine Learning 2(4):285-318, April 1988.
Our analysis of Winnow is based on techniques from
N. Cesa-Bianchi: Analysis of two gradient-based algorithms for on-line regression, Journal of Computer and System Sciences 59(3):392-411, 1999.
The more advanced techniques for tuning the learning rate online are explained in
Peter Auer, Nicolò Cesa-Bianchi and Claudio Gentile: Adaptive and self-confident on-line learning algorithms, Journal of Computer and System Sciences 64(1):48-75, February 2002.

Statistical learning theory

The test set bound and Occam's Razor bound are from

John Langford: Practical Prediction Theory for Classification (a tutorial presented at ICML 2003).
Our treatment of Rademacher complexity follows mainly the textbook of Shawe-Taylor and Cristianini. The proof of Theorem 3.24 is based on
Matti Kääriäinen: Relating the Rademacher and VC bounds. University of Helsinki, Department of Computer Science, Report C-2004-57, 2004.
For the more background on statistical learning theory, see the tutorial
Shahar Mendelson: A few notes on statistical learning theory. In Advanced Lectures on Machine Learning, pages 1-40, Springer LNCS 2600, January 2003.
The conversion from online to statistical learning is given in
Nicolò Cesa-Bianchi, Alex Conconi and Claudio Gentile: On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory 50(9):2050-2057, September 2004.

Support vector machines

We followed mainly Section 7.2 of the textbook Shawe-Taylor and Cristianini (2004) cited above. The background on optimisation theory is based on

Stephen Boyd and Lieven Vandenberghe: Convex Optimization. Cambridge University Press 2004
that is also freely available in electronic form.


This section was almost entirely based on the highly recommended survey article

Ron Meir and Gunnar Rätsch: An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning, pages 118-183, Springer LNCS 2600, January 2003.
For additional material, see the extensive list of references in the survey.

Other resources

Journals and conferences

Courses at other universities: Misc. articles: Other sites of interest:

Jyrki Kivinen