Data Mining (guided self study)

Algoritmit ja koneoppiminen
Syventävät opinnot
This course focuses on concepts and methods for frequent pattern discovery, also known as association analysis. This edition of the course is a structured and guided self-study course with weekly tasks and supervision, with mandatory attendance. Prerequisites: BSc degree and the course Introduction to Machine Learning or equivalent. Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2015 kevät 11.03-29.04. 4-4 Englanti Hannu Toivonen


Aika Huone Luennoija Päivämäärä
Ke 12-14 C222 Hannu Toivonen 11.03.2015-29.04.2015


Group: 1
Aika Huone Ohjaaja Päivämäärä Huomioitavaa
Ma 10-12 B221 Arto Vihavainen 13.03.2015—30.04.2015
Pe 14-16 B221 Arto Vihavainen 13.03.2015—30.04.2015

Ilmoittautuminen tälle kurssille alkaa tiistaina 17.2. klo 9.00.

Registration for this course starts on Tuesday 17th of February at 9.00.


This course will familiarize the participants with concepts and methods for identifying interesting patterns from large datasets. Data mining is about trying to make sense of data, usually without clear questions or clear success criteria. The course will focus on discovery of frequent patters in data, a fundamental data mining task that can help extract knowledge and previously unknown patterns also from largely unstructured data.

For unofficial IRC guidance, a channel #dm2015 has been set up on IRCNet 

Note! Please fill in the course feedback form at -- when you enter the page, select "Data Mining" from the course list. 

After this course, consider taking the Data Mining Project


Kurssin suorittaminen

Albeit being a self-study course, the course will contain scheduled activities that are to be completed within a given time-frame. The course is completed by

  1. carrying out weekly individual assignments and keeping a learning journal,
  2. participating in group work where the groups determine research questions and infer knowledge from a larger data set, and
  3. studying.

The wednesday meetings are mandatory.

There are no traditional lectures per se, and as such the learning approach taken in the course is self and group study. Participants will get guidance during the lab-times which are voluntary.

If you wish to take the course without participating in any of the activities, attend a separate exam. See for the exam schedule.

Kirjallisuus ja materiaali

Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006. Links:

Material covered (also in separate exams): Chapters 6 and 7 of Tan et al, except sections 6.2.4 (Support Counting), 6.3.2 (Rule Generation in Apriori Algorithm), 6.8 (Effect of Skewed Support Distribution), 7.5 (Subgraph Patterns), 7.6 (Infrequent Patterns).

Course guidelines:

Can be found at


Individual assignments, week 1

Individual assignments, week 2

Individual assignments, week 3

Individual assignments, week 4

Individual assignments, week 5

Individual assignments, week 6

First group work assignment

Second group work assignment

Third group work assignment