Data Mining (guided self study)

Perustiedot

Kurssikoodi: 582634

Opintopisteet: 5

Erikoistumislinja: Algoritmit ja koneoppiminen

Taso: Syventävät opinnot

Kuvaus:

This course focuses on concepts and methods for frequent pattern discovery, also known as association analysis. This edition of the course is a structured and guided self-study course with weekly tasks and supervision, with mandatory attendance. Prerequisites: BSc degree and the course Introduction to Machine Learning or equivalent. Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006.

Vuosi	Lukukausi	Päivämäärä	Periodi	Kieli	Vastuuhenkilö
2017	kevät	20.01-03.03.	3-3	Englanti	Hannu Toivonen

Luennot

Aika	Huone	Luennoija	Päivämäärä
Pe 10-12	B222	Hannu Toivonen	20.01.2017-03.03.2017

Information for international students

This course is given in English.

Yleistä

This course will familiarize the participants with concepts and methods for identifying interesting patterns from large datasets. Data mining is about trying to make sense of data, usually without clear questions or clear success criteria. The course will focus on discovery of frequent patters in data, a fundamental data mining task that can help extract knowledge and previously unknown patterns also from largely unstructured data.

Kurssin suorittaminen

This instance of the course is based on self studies, according to a given study schedule and supported by weekly mentoring by the professor. Mentoring is based on so-called flipped classroom: students study the material first, and the meetings on Fridays are used to answer questions by the students, fill the gaps etc.

The course is completed solely by taking a final exam on 10 March 2017 (or 25 April). Check out https://www.cs.helsinki.fi/en/exams for possible changes on exam schedules. Participation in Friday sessions in voluntary. There are no exercise sessions.

NEW (24 Mar 2017): The exam has been graded and results are available at https://ilmo.cs.helsinki.fi/tulokset/studies. You should be able to see your points for each task in the exam. If you have any questions, contact Hannu by dropping in in his lab (rooms B233/B232) wihtout an appointment. Good times to find him: Wed (29 Mar) 9:30-12, Thu (30 Mar) 14-16, Fri (31 Mar) 10-14.

Schedule

The following topics are to be studied before the respective meeting date. The meetings are based on students' needs, not on planned lectures.

Week 2: Frequent itemset generation (Sections 6.1-6.2 except 6.2.4)
Week 3: Compact representation of frequent itemsets (Section 6.4)
Week 4: Alternative methods for generating frequent itemsets and FP-growth (Sections 6.5-6.6)
Week 5: Rule generation and evaluation of association patterns (Sections 6.3 and 6.7 except 6.3.2)
Week 6: Handling categorical and continuous attributes and a concept hierarchy (Sections 7.1-7.3) (NB: Hannu will not be present this week)
Week 7: Sequential patterns (Section 7.4)

The course has a closed FaceBook group where students can share hints as well as ask and give advice regarding the course.

Feedback

Please give feedback for the course using the department's anonymous feedback form (look for "Data Mining" under Advanced studies). Thank you.

Kirjallisuus ja materiaali

Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006. Links:

Book home page
Electronic copy of Chapter 6 (from the book home page; Chapter 7 is not available)
Slides associated with the book
Errata

Additional material on the same topics (note: notations may differ):

Short encyclopedic entries for Frequent pattern, Frequent itemset, Apriori algorithm, Association rule, Basket analysis,
Very good slides on FPgrowth by Florian Verhein
Text on Frequent pattern generation
Slides on Closed sets
Slides on FPtree

Useful material for self studies can also be found from previous editions of this course:

Many of the exercises done in the classroom are from the "weekly tests" of course of 2016. The 2016 course page also contains links to their solutions.

Definitive course contents (covered in exams)

Chapters 6 and 7 of Tan et al, except not the following: Sections 6.2.4 (Support Counting), 6.3.2 (Rule Generation in Apriori Algorithm), 6.8 (Effect of Skewed Support Distribution), 7.5 (Subgraph Patterns), 7.6 (Infrequent Patterns).

Example exam from 2016

Osoite: Tietojenkäsittelytieteen laitos, PL 68 (Gustaf Hällströmin katu 2b), 00014 Helsingin yliopisto
Aukioloajat: Normaalisti syys- ja kevätlukukausien aikana ma - pe klo 7.45-19.45.
Puhelin: 0294 1911 (yliopiston vaihde)
Sähköposti: Palveluosoitteet
Faksi: 09 876 4314

Kirjaudu sivulle | Webmaster

Department of Computer Science [pre 2018 site]

Helsingin Yliopisto

Matemaattis-luonnontieteellinen tiedekunta