58147-9 Machine Learning, Spring 2005
The course is over, but you can still give feedback.
- exam problems
- tenttitehtävät suomeksi
- (some kind of) model solutions to the exam will appear here soon
-
results
Instruction
The course consists of traditional lectures, exercice sessions and an examination. Some of the exercises include simple computer experiments, but the main focus in on developing the theory.
- Lectures
- Jyrki Kivinen 25.1.-13.4. Tue 16-18, Wed 14-16 D122
- Exercise sessions
- Jussi Lindgren 31.1.-22.4. Wed 16-18 B119
- Examination
- Fri 29.4. at 14-18 A111
Course material
Course description (in Finnish)
There is no textbook. The examination will be based on material that has been covered in the lectures and exercises. Some suggestions for additional reading are at the end of this page.
Lecture notes: PostScript four slides per sheet, for printing; PDF for viewing on screen. A paper version is available in the course folder in room C127 for copying. (Currently Chapter 1 is still in Finnish, the rest in English; sorry for the confusion.)
Exercises:
- homework 2.2.: problems PS, PDF, solutions.
- homework 9.2.: problems PS, PDF, solutions.
- homework 16.2.: problems PS, PDF; solutions.
- homework 23.2.: problems PS, PDF; solutions.
- homework 2.3.: problems
PS,
PDF;
solutions
(the URL's in the paper have a typo: "jtlindgre" should be "jtlindgr")
data Matlab, R - homework 9.3.: problems PS, PDF; solutions
- homework 16.3.: problems PS, PDF; solutions
- homework 23.3.: problems PS, PDF; solutions
- homework 6.4.: problems PS, PDF; solutions
- homework 13.4.: problems
PS,
PDF;
solutions
OSU SVM Toolbox
Artificial data for Problem 4 - homework 20.4.: problems PS, PDF; solutions
Additional reading material
This listing is given for those who find the quality of presentation in the lecture notes inadequate and/or wish to know more. They are not part of the course requirements.
General Textbooks
The recommended general presentation of machine learning is
Tom Mitchell: Machine Learning, McGraw Hill 1997.The standard textbook for computational learning theory is
Michael J. Kearns and Umesh V. Vazirani: An Introduction to Computational Learning Theory, MIT Press 1994but it has somewhat different bias than our course.
There are various books on support vector machines and related techniques, among which
John Shawe-Taylor and Nello Cristianini: Kernel Methods for Pattern Analysis, Cambridge University Press 2004is very recent and similar in spirit to our course. The earlier book by the same authors, An Introduction to Support Vector Machines, is also good.
Online learning
Remark: Our reading list for online learning is rather diverse. This reflects the fact that online learning is not as established as the other topics of the course. There are no text books and few survey articles. Also, this happens to be the personal favourite of the lecturer. For the other parts of the course, we can hopefully collect a more compact list of reference articles.
The Weighted Majority algorithm is analysed in
N. Littlestone and M. K. Warmuth: The weighted majority algorithm, Information and Computation 108(2):212-261, February 1994and the Weighted Average in
Jyrki Kivinen and Manfred K. Warmuth: Averaging expert predictions, in Proc. 4th European Conference on Computational Learning Theory, pages 153-167, Springer LNAI 1572, Berlin, March 1999.The Aggregating Algorithm is due to Vovk. Perhaps the most accessible of his articles on this topic is
V. Vovk: A game of prediction with expert advice, Journal of Computer and System Sciences 56(2):153-173, April 1998.However, it might be easier to start from
Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire and Manfred K. Warmuth: How to use expert advice, Journal of the ACM 44(3):427 - 485, May 1997,which gives a detailed description of the absolute loss case (including the doubling trick) or
David Haussler, Jyrki Kivinen and Manfred K. Warmuth: Sequential prediction of individual sequences under general loss functions, IEEE Transactions on Information Theory 44(5):1906-1925, September 1998.that covers the "nice" loss functions (square, log etc.). The tutorial
Jyrki Kivinen: Online learning of linear classifiers, In S. Mendelson and A. J. Smola, editors, Advanced Lectures on Machine Learning, pages 235-257, Springer LNCS 2600, Berlin, January 2003covers a lot of the recent theory of online linear classification, including p-norm Perceptrons, the Marginalised Perceptron and more. The examples of kernels are taken from Chapter 9 of the book by Shawe-Taylor and Cristianini (see above). For the original Winnow algorithm, see
Nick Littlestone: Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm, Machine Learning 2(4):285-318, April 1988.Our analysis of Winnow is based on techniques from
N. Cesa-Bianchi: Analysis of two gradient-based algorithms for on-line regression, Journal of Computer and System Sciences 59(3):392-411, 1999.The more advanced techniques for tuning the learning rate online are explained in
Peter Auer, Nicolò Cesa-Bianchi and Claudio Gentile: Adaptive and self-confident on-line learning algorithms, Journal of Computer and System Sciences 64(1):48-75, February 2002.
Statistical learning theory
The test set bound and Occam's Razor bound are from
John Langford: Practical Prediction Theory for Classification (a tutorial presented at ICML 2003).Our treatment of Rademacher complexity follows mainly the textbook of Shawe-Taylor and Cristianini. The proof of Theorem 3.24 is based on
Matti Kääriäinen: Relating the Rademacher and VC bounds. University of Helsinki, Department of Computer Science, Report C-2004-57, 2004.For the more background on statistical learning theory, see the tutorial
Shahar Mendelson: A few notes on statistical learning theory. In Advanced Lectures on Machine Learning, pages 1-40, Springer LNCS 2600, January 2003.The conversion from online to statistical learning is given in
Nicolò Cesa-Bianchi, Alex Conconi and Claudio Gentile: On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory 50(9):2050-2057, September 2004.
Support vector machines
We followed mainly Section 7.2 of the textbook Shawe-Taylor and Cristianini (2004) cited above. The background on optimisation theory is based on
Stephen Boyd and Lieven Vandenberghe: Convex Optimization. Cambridge University Press 2004that is also freely available in electronic form.
Boosting
This section was almost entirely based on the highly recommended survey article
Ron Meir and Gunnar Rätsch: An introduction to boosting and leveraging. In Advanced Lectures on Machine Learning, pages 118-183, Springer LNCS 2600, January 2003.For additional material, see the extensive list of references in the survey.
Other resources
Journals and conferences
- Machine Learning
- Journal of Machine Learning Research
- IEEE Transactions on Information Theory
- Conference on Learning Theory (COLT, formerly Workshop on Computational Learning Theory): proceedings since 2001 published also online by Springer (search "colt 2001" etc.)
- International Conference on Machine Learning (ICML)
- Neural Information Processing Systems (NIPS)
- Machine Learning Theory by Avrim Blum, Carnegie Mellon University
- Machine Learning by Tom Dietterich, Oregon State University
- Machine Learning by Andrew Ng, Stanford University
- Ronald L. Rivest: Learning decision lists. Machine Learning 2(3):229-246, 1987.
- SVM homepage
- Boosting homepage
- Association for Computational Learning
- Journal of Machine Learning Gossip

