Introduction to Machine Learning

582631
5
Algoritmit ja koneoppiminen
Syventävät opinnot
Basic concepts and methods of machine learning, in theory and in practice. Supervised learning (classification, regression) and unsupervised learning (clustering). The course serves as preparation for various courses on data analysis, machine learning and bioinformatics. Course book: Course book: An Introduction to Statistical Learning with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, Springer, 2013.

Koe

13.12.2011 16.00 A111
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2011 syksy 01.11-09.12. 2-2 Englanti Patrik Hoyer

Luennot

Aika Huone Luennoija Päivämäärä
Ti 10-12 D122 Patrik Hoyer 01.11.2011-09.12.2011
Pe 10-12 D122 Patrik Hoyer 01.11.2011-09.12.2011

Harjoitusryhmät

Group: 1
Aika Huone Ohjaaja Päivämäärä Huomioitavaa
Ti 12-14 B221 Doris Entner 01.11.2011—01.11.2011
To 12-14 D122 Doris Entner 07.11.2011—09.12.2011
Group: 2
Aika Huone Ohjaaja Päivämäärä Huomioitavaa
Pe 12-14 B221 Antti Hyttinen 04.11.2011—04.11.2011
Pe 12-14 D122 Antti Hyttinen 07.11.2011—09.12.2011

Registration for this course starts on Tuesday October 11th at 9.00. There"s additional guidance for Matlab/R on Tue November 1st at 12-14 in B221 and Fri November 4th at 12-14 in B221.

Yleistä

Quick link: course Moodle page.

Feedback: Here is a summary of feedback received, with the lecturer's comments.

Machine learning and data mining deals with designing computer algorithms that find interesting patterns in data and that can learn from experience. As the cost of measuring, storing, and transmitting data has plummeted in recent years, the amount of data being collected and analyzed has grown at an amazing pace in both business and scientific applications.

For example, today internet search engine companies routinely use techniques from machine learning to help users find the information they seek, while the financial sector uses data mining techniques to identify fraudulent credit card transactions and medical companies use statistical methods in drug development. In this day and age, almost any business utilizes some form of data analysis or another.

Similarly, much of modern science today depends on computational methods for discovering relationships between variables in high-dimensional datasets. In bioinformatics, the advent of measurement technology for sequencing whole genomes and measuring the expression of thousands of genes has required the development of completely new data analysis methods. In many other fields as well sensors have become cheap to the point where the main bottleneck is the analysis of the resulting data, rather than the measurement technology.

This course provides an introduction to machine learning and data mining techniques, and serves as preparation for a variety of courses on data analysis, machine learning, and bioinformatics. While one goal is to present a broad overview of the field, the course will also give the students a basic understanding of standard problems such as classification, regression, data clustering, and anomaly detection. The students will obtain an understanding of the relevant techniques by applying them to real-world data sets.

Main themes and learning objectives: Detailed here in English and in Finnish

Course staff: The lectures will be given by Dr. Patrik Hoyer, while the exercises will be held by Doris Entner and Antti Hyttinen. There are no designated office hours, please set up an appointment by e-mail if necessary.

Exercises first week: Instead of the regular exercises, during the first week (1.11 and 4.11) there will be guidance on the software (Matlab/Octave/R) used in the course. There are two sessions: Tuesday 12-14 (in B221) and Friday 12-14 (in B221). (Note: these are identical so please attend only one.) The purpose is to familiarize the students with the software packages which will be used in the course for implementing the various algorithms.

Kurssin suorittaminen

In addition to the lectures, the course consists of weekly exercises and a final exam. The exercises constitute 40% of the course total, while the exam makes up 60% of the total points. To pass the course, the student must

  • obtain at least half of the points available in the weekly exercises, and
  • pass the final exam (obtain at least half the available points in the exam).

Weekly exercise points are awarded for students turning in their solutions to the weekly exercises, and for attending the exercise sessions. More details will be given at the start of the course.

Prerequisites:

  • Some programming skills (we will use Matlab/Octave/R but no prior exposure to these particular environments is needed)
  • Basic probability theory and linear algebra (the course textbook provides a refresher of the most basic concepts in the appendices).

Please register for the course using the university registration system (see the link on the left). Only registered students can be assigned credits.

Please also sign up for the course in Moodle. All the course material will be available in Moodle and students will be kept informed of current course events using email from Moodle.

For those wishing to take a renewal exam or a separate exam in the spring/summer/fall of 2012, all the details and instructions will be provided in Moodle. Please 'sign up' for the course to get access to the material. In brief: anybody eligible to take the course exam can retake it in the spring/summer/fall of 2012 without additional exercises (with 40% of the final grade based on the weekly exercises from the course in the autumn). Anybody who wishes to take part in a separate exam (in which the weekly exercise points are not counted) needs to first successfully complete some programming exercises. These (and the instructions) will be put in Moodle.

Kirjallisuus ja materiaali

The textbook for the course will consist of (selected parts from): Tan, Steinbach, Kumar (2005): Introduction to Data Mining (publisher, amazon.co.uk, bookplus)

There are a total of 12 copies available in the Kumpula Science Library (of these, one copy is a "reading room copy" and so cannot be borrowed; it should always be there).

Other material (e.g. reading lists, lecture slides, exercise sets, instructions and documentation for Matlab, Octave, and R, and links to the datasets used) will be put in Moodle as the course progresses.