Data Mining
Exam
Year | Semester | Date | Period | Language | In charge |
---|---|---|---|---|---|
2014 | spring | 10.03-24.04. | 4-4 | English | Fabio Cunial |
Lectures
Time | Room | Lecturer | Date |
---|---|---|---|
Mon 12-15 | D122 | Fabio Cunial | 10.03.2014-24.04.2014 |
Thu 12-15 | D122 | Fabio Cunial | 10.03.2014-24.04.2014 |
Time slots reserved for the lectures will also be used for exercises. Taking the course requires active participation in all time slots.
Information for international students
The language of the course (and of the instructor) is English: homeworks, reports, oral and written exams will be in English. Please contact the instructor if you want to take the exam in Finnish or Swedish.
General
Data mining or knowledge discovery (tiedon louhinta in Finnish) is the theory of discovering regularities and repetitions in discrete datasets. The course presents the theory and algorithms of a general, domain-independent class of regularities (frequent itemsets and association rules), and applies these concepts to real-world datasets. For a list of key concepts covered by the course, see the list of topics.
Completing the course
The course assumes that students have a BS degree and that they attended an "introduction to machine learning" course or equivalent. There are three strategies for completing the course, two of which are mutually incompatible:
S1: Active participation in all lectures, and submitting all project reports. Students are evaluated on the basis of their reports, on the curiosity and initiative they display in class, and on their presentations (see below).
S1.1: Like strategy (S1), but a student can skip one project and replace it with creating or improving a number of wikipedia pages related to the course. If you are new to wikipedia editing, try this basic training and this quick-start guide. You might also browse a list of high-quality wikipedia pages in computational biology for setting your quality standard.
S2: Taking a final written exam, without attending any lecture and without submitting any project report.
In strategy (S1), attending the first lecture of the course is mandatory: students cannot join the course after the first lecture. In strategy (S1.1), the student proposes to the instructor the number and extent of wikipedia edits she is planning to perform, and the instructor accepts of revises this plan.
Literature and material
Textbook
Tan, Steinbach, Kumar (2006). "Introduction to data mining", Pearson Education (at amazon.com). The course covers only chapters 6 ("Association analysis: basic concepts and algorithms") and 7 ("Association analysis: advanced concepts"). Sample chapters and slides are available at the textbook website.
Additional material
- Mannila, Toivonen (2002). "Knowledge discovery in databases: the search for frequent patterns". For slightly more advanced variations on the main themes of the course.
- Leskovec, Rajaraman, Ullman (2014). "Mining of massive datasets". For slightly more advanced variations on the main themes of the course.
- Bart Goethals (2004). Repository of frequent itemset mining implementations.
- Fournier-Viger, P., Gomariz, A., Soltani, A., Gueniche, T. (2013). SPMF: A sequential pattern mining framework.
- Research papers distributed by the instructor during the course.
- Parida (2008). "Pattern discovery in bioinformatics: theory and algorithms". Chapman & Hall/CRC. The main reference for the last three weeks of the course. Advanced pattern discovery algorithms on strings.
Top conferences and journals (for the enthusiast)
- 2013 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
- 2013 IEEE International Conference on Data Mining (ICDM)
- 2013 SIAM International Conference on Data Mining (SDM)
- 2013 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)
- Data Mining and Knowledge Discovery