Data Mining (guided self study)
Vuosi | Lukukausi | Päivämäärä | Periodi | Kieli | Vastuuhenkilö |
---|---|---|---|---|---|
2017 | kevät | 20.01-03.03. | 3-3 | Englanti | Hannu Toivonen |
Luennot
Aika | Huone | Luennoija | Päivämäärä |
---|---|---|---|
Pe 10-12 | B222 | Hannu Toivonen | 20.01.2017-03.03.2017 |
Information for international students
This course is given in English.
Yleistä
This course will familiarize the participants with concepts and methods for identifying interesting patterns from large datasets. Data mining is about trying to make sense of data, usually without clear questions or clear success criteria. The course will focus on discovery of frequent patters in data, a fundamental data mining task that can help extract knowledge and previously unknown patterns also from largely unstructured data.
Kurssin suorittaminen
This instance of the course is based on self studies, according to a given study schedule and supported by weekly mentoring by the professor. Mentoring is based on so-called flipped classroom: students study the material first, and the meetings on Fridays are used to answer questions by the students, fill the gaps etc.
The course is completed solely by taking a final exam on 10 March 2017 (or 25 April). Check out https://www.cs.helsinki.fi/en/exams for possible changes on exam schedules. Participation in Friday sessions in voluntary. There are no exercise sessions.
NEW (24 Mar 2017): The exam has been graded and results are available at https://ilmo.cs.helsinki.fi/tulokset/studies. You should be able to see your points for each task in the exam. If you have any questions, contact Hannu by dropping in in his lab (rooms B233/B232) wihtout an appointment. Good times to find him: Wed (29 Mar) 9:30-12, Thu (30 Mar) 14-16, Fri (31 Mar) 10-14.
Schedule
The following topics are to be studied before the respective meeting date. The meetings are based on students' needs, not on planned lectures.
- Week 2: Frequent itemset generation (Sections 6.1-6.2 except 6.2.4)
- Week 3: Compact representation of frequent itemsets (Section 6.4)
- Week 4: Alternative methods for generating frequent itemsets and FP-growth (Sections 6.5-6.6)
- Week 5: Rule generation and evaluation of association patterns (Sections 6.3 and 6.7 except 6.3.2)
- Week 6: Handling categorical and continuous attributes and a concept hierarchy (Sections 7.1-7.3) (NB: Hannu will not be present this week)
- Week 7: Sequential patterns (Section 7.4)
The course has a closed FaceBook group where students can share hints as well as ask and give advice regarding the course.
Feedback
Please give feedback for the course using the department's anonymous feedback form (look for "Data Mining" under Advanced studies). Thank you.
Kirjallisuus ja materiaali
Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006. Links:
- Book home page
- Electronic copy of Chapter 6 (from the book home page; Chapter 7 is not available)
- Slides associated with the book
- Errata
Additional material on the same topics (note: notations may differ):
- Short encyclopedic entries for Frequent pattern, Frequent itemset, Apriori algorithm, Association rule, Basket analysis,
- Very good slides on FPgrowth by Florian Verhein
- Text on Frequent pattern generation
- Slides on Closed sets
- Slides on FPtree
Useful material for self studies can also be found from previous editions of this course:
Many of the exercises done in the classroom are from the "weekly tests" of course of 2016. The 2016 course page also contains links to their solutions.
Definitive course contents (covered in exams)
Chapters 6 and 7 of Tan et al, except not the following: Sections 6.2.4 (Support Counting), 6.3.2 (Rule Generation in Apriori Algorithm), 6.8 (Effect of Skewed Support Distribution), 7.5 (Subgraph Patterns), 7.6 (Infrequent Patterns).