581479 Machine Learning, Spring 2005
The course is over, but you can still give feedback.
 exam problems
 tenttitehtävät suomeksi
 (some kind of) model solutions to the exam will appear here soon

results
Instruction
The course consists of traditional lectures, exercice sessions and an examination. Some of the exercises include simple computer experiments, but the main focus in on developing the theory.
 Lectures
 Jyrki Kivinen 25.1.13.4. Tue 1618, Wed 1416 D122
 Exercise sessions
 Jussi Lindgren 31.1.22.4. Wed 1618 B119
 Examination
 Fri 29.4. at 1418 A111
Course material
Course description (in Finnish)
There is no textbook. The examination will be based on material that has been covered in the lectures and exercises. Some suggestions for additional reading are at the end of this page.
Lecture notes: PostScript four slides per sheet, for printing; PDF for viewing on screen. A paper version is available in the course folder in room C127 for copying. (Currently Chapter 1 is still in Finnish, the rest in English; sorry for the confusion.)
Exercises:
 homework 2.2.: problems PS, PDF, solutions.
 homework 9.2.: problems PS, PDF, solutions.
 homework 16.2.: problems PS, PDF; solutions.
 homework 23.2.: problems PS, PDF; solutions.
 homework 2.3.: problems
PS,
PDF;
solutions
(the URL's in the paper have a typo: "jtlindgre" should be "jtlindgr")
data Matlab, R  homework 9.3.: problems PS, PDF; solutions
 homework 16.3.: problems PS, PDF; solutions
 homework 23.3.: problems PS, PDF; solutions
 homework 6.4.: problems PS, PDF; solutions
 homework 13.4.: problems
PS,
PDF;
solutions
OSU SVM Toolbox
Artificial data for Problem 4  homework 20.4.: problems PS, PDF; solutions
Additional reading material
This listing is given for those who find the quality of presentation in the lecture notes inadequate and/or wish to know more. They are not part of the course requirements.
General Textbooks
The recommended general presentation of machine learning is
Tom Mitchell: Machine Learning, McGraw Hill 1997.The standard textbook for computational learning theory is
Michael J. Kearns and Umesh V. Vazirani: An Introduction to Computational Learning Theory, MIT Press 1994but it has somewhat different bias than our course.
There are various books on support vector machines and related techniques, among which
John ShaweTaylor and Nello Cristianini: Kernel Methods for Pattern Analysis, Cambridge University Press 2004is very recent and similar in spirit to our course. The earlier book by the same authors, An Introduction to Support Vector Machines, is also good.
Online learning
Remark: Our reading list for online learning is rather diverse. This reflects the fact that online learning is not as established as the other topics of the course. There are no text books and few survey articles. Also, this happens to be the personal favourite of the lecturer. For the other parts of the course, we can hopefully collect a more compact list of reference articles.
The Weighted Majority algorithm is analysed in
N. Littlestone and M. K. Warmuth: The weighted majority algorithm, Information and Computation 108(2):212261, February 1994and the Weighted Average in
Jyrki Kivinen and Manfred K. Warmuth: Averaging expert predictions, in Proc. 4th European Conference on Computational Learning Theory, pages 153167, Springer LNAI 1572, Berlin, March 1999.The Aggregating Algorithm is due to Vovk. Perhaps the most accessible of his articles on this topic is
V. Vovk: A game of prediction with expert advice, Journal of Computer and System Sciences 56(2):153173, April 1998.However, it might be easier to start from
Nicolò CesaBianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire and Manfred K. Warmuth: How to use expert advice, Journal of the ACM 44(3):427  485, May 1997,which gives a detailed description of the absolute loss case (including the doubling trick) or
David Haussler, Jyrki Kivinen and Manfred K. Warmuth: Sequential prediction of individual sequences under general loss functions, IEEE Transactions on Information Theory 44(5):19061925, September 1998.that covers the "nice" loss functions (square, log etc.). The tutorial
Jyrki Kivinen: Online learning of linear classifiers, In S. Mendelson and A. J. Smola, editors, Advanced Lectures on Machine Learning, pages 235257, Springer LNCS 2600, Berlin, January 2003covers a lot of the recent theory of online linear classification, including pnorm Perceptrons, the Marginalised Perceptron and more. The examples of kernels are taken from Chapter 9 of the book by ShaweTaylor and Cristianini (see above). For the original Winnow algorithm, see
Nick Littlestone: Learning quickly when irrelevant attributes abound: a new linearthreshold algorithm, Machine Learning 2(4):285318, April 1988.Our analysis of Winnow is based on techniques from
N. CesaBianchi: Analysis of two gradientbased algorithms for online regression, Journal of Computer and System Sciences 59(3):392411, 1999.The more advanced techniques for tuning the learning rate online are explained in
Peter Auer, Nicolò CesaBianchi and Claudio Gentile: Adaptive and selfconfident online learning algorithms, Journal of Computer and System Sciences 64(1):4875, February 2002.
Statistical learning theory
The test set bound and Occam's Razor bound are from
John Langford: Practical Prediction Theory for Classification (a tutorial presented at ICML 2003).Our treatment of Rademacher complexity follows mainly the textbook of ShaweTaylor and Cristianini. The proof of Theorem 3.24 is based on
Matti Kääriäinen: Relating the Rademacher and VC bounds. University of Helsinki, Department of Computer Science, Report C200457, 2004.For the more background on statistical learning theory, see the tutorial
Shahar Mendelson: A few notes on statistical learning theory. In Advanced Lectures on Machine Learning, pages 140, Springer LNCS 2600, January 2003.The conversion from online to statistical learning is given in
Nicolò CesaBianchi, Alex Conconi and Claudio Gentile: On the generalization ability of online learning algorithms. IEEE Transactions on Information Theory 50(9):20502057, September 2004.
Support vector machines
We followed mainly Section 7.2 of the textbook ShaweTaylor and Cristianini (2004) cited above. The background on optimisation theory is based on
Stephen Boyd and Lieven Vandenberghe: Convex Optimization. Cambridge University Press 2004that is also freely available in electronic form.
Boosting
This section was almost entirely based on the highly recommended survey article
Ron Meir and Gunnar Rätsch: An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning, pages 118183, Springer LNCS 2600, January 2003.For additional material, see the extensive list of references in the survey.
Other resources
Journals and conferences
 Machine Learning
 Journal of Machine Learning Research
 IEEE Transactions on Information Theory
 Conference on Learning Theory (COLT, formerly Workshop on Computational Learning Theory): proceedings since 2001 published also online by Springer (search "colt 2001" etc.)
 International Conference on Machine Learning (ICML)
 Neural Information Processing Systems (NIPS)
 Machine Learning Theory by Avrim Blum, Carnegie Mellon University
 Machine Learning by Tom Dietterich, Oregon State University
 Machine Learning by Andrew Ng, Stanford University
 Ronald L. Rivest: Learning decision lists. Machine Learning 2(3):229246, 1987.
 SVM homepage
 Boosting homepage
 Association for Computational Learning
 Journal of Machine Learning Gossip