Data Mining (guided self study)

Algoritmit ja koneoppiminen
Syventävät opinnot
This course focuses on concepts and methods for frequent pattern discovery, also known as association analysis. This edition of the course is a structured and guided self-study course with weekly tasks and supervision, with mandatory attendance. Prerequisites: BSc degree and the course Introduction to Machine Learning or equivalent. Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2017 kevät 20.01-03.03. 3-3 Englanti Hannu Toivonen


Aika Huone Luennoija Päivämäärä
Pe 10-12 B222 Hannu Toivonen 20.01.2017-03.03.2017

Information for international students

This course is given in English.


This course will familiarize the participants with concepts and methods for identifying interesting patterns from large datasets. Data mining is about trying to make sense of data, usually without clear questions or clear success criteria. The course will focus on discovery of frequent patters in data, a fundamental data mining task that can help extract knowledge and previously unknown patterns also from largely unstructured data.

Kurssin suorittaminen

This instance of the course is based on self studies, according to a given study schedule and supported by weekly mentoring by the professor. Mentoring is based on so-called flipped classroom: students study the material first, and the meetings on Fridays are used to answer questions by the students, fill the gaps etc.

The course is completed solely by taking a final exam on 10 March 2017 (or 25 April). Check out for possible changes on exam schedules. Participation in Friday sessions in voluntary. There are no exercise sessions.

NEW (24 Mar 2017): The exam has been graded and results are available at You should be able to see your points for each task in the exam. If you have any questions, contact Hannu by dropping in in his lab (rooms B233/B232) wihtout an appointment. Good times to find him: Wed (29 Mar) 9:30-12, Thu (30 Mar) 14-16, Fri (31 Mar) 10-14.


The following topics are to be studied before the respective meeting date. The meetings are based on students' needs, not on planned lectures.  

  • Week 2: Frequent itemset generation (Sections 6.1-6.2 except 6.2.4)
  • Week 3: Compact representation of frequent itemsets (Section 6.4)
  • Week 4: Alternative methods for generating frequent itemsets and FP-growth (Sections 6.5-6.6)
  • Week 5: Rule generation and evaluation of association patterns (Sections 6.3 and 6.7 except 6.3.2)
  • Week 6: Handling categorical and continuous attributes and a concept hierarchy (Sections 7.1-7.3) (NB: Hannu will not be present this week)
  • Week 7: Sequential patterns (Section 7.4)

The course has a closed FaceBook group where students can share hints as well as ask and give advice regarding the course. 


Please give feedback for the course using the department's anonymous feedback form (look for "Data Mining" under Advanced studies). Thank you.

Kirjallisuus ja materiaali

Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006. Links:

Additional material on the same topics (note: notations may differ):

Useful material for self studies can also be found from previous editions of this course:

Many of the exercises done in the classroom are from the "weekly tests" of course of 2016. The 2016 course page also contains links to their solutions.

Definitive course contents (covered in exams) 

Chapters 6 and 7 of Tan et al, except not the following: Sections 6.2.4 (Support Counting), 6.3.2 (Rule Generation in Apriori Algorithm), 6.8 (Effect of Skewed Support Distribution), 7.5 (Subgraph Patterns), 7.6 (Infrequent Patterns).