Data Mining

582634
5
Algorithms and machine learning
Advanced studies
This course focuses on concepts and methods for frequent pattern discovery, also known as association analysis. This edition of the course is a structured and guided self-study course with weekly tasks and supervision, with mandatory attendance. Prerequisites: BSc degree and the course Introduction to Machine Learning or equivalent. Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006.

Exam

29.04.2014 16.00 A111
Year Semester Date Period Language In charge
2014 spring 10.03-24.04. 4-4 English Fabio Cunial

Lectures

Time Room Lecturer Date
Mon 12-15 D122 Fabio Cunial 10.03.2014-24.04.2014
Thu 12-15 D122 Fabio Cunial 10.03.2014-24.04.2014

Time slots reserved for the lectures will also be used for exercises. Taking the course requires active participation in all time slots.

Information for international students

The language of the course (and of the instructor) is English: homeworks, reports, oral and written exams will be in English. Please contact the instructor if you want to take the exam in Finnish or Swedish.

General

Data mining or knowledge discovery (tiedon louhinta in Finnish) is the theory of discovering regularities and repetitions in discrete datasets. The course presents the theory and algorithms of a general, domain-independent class of regularities (frequent itemsets and association rules), and applies these concepts to real-world datasets. For a list of key concepts covered by the course, see the list of topics.

Completing the course

The course assumes that students have a BS degree and that they attended an "introduction to machine learning" course or equivalent. There are three strategies for completing the course, two of which are mutually incompatible:

S1: Active participation in all lectures, and submitting all project reports. Students are evaluated on the basis of their reports, on the curiosity and initiative they display in class, and on their presentations (see below).

S1.1: Like strategy (S1), but a student can skip one project and replace it with creating or improving a number of wikipedia pages related to the course. If you are new to wikipedia editing, try this basic training and this quick-start guide. You might also browse a list of high-quality wikipedia pages in computational biology for setting your quality standard.

S2: Taking a final written exam, without attending any lecture and without submitting any project report.

In strategy (S1), attending the first lecture of the course is mandatory: students cannot join the course after the first lecture. In strategy (S1.1), the student proposes to the instructor the number and extent of wikipedia edits she is planning to perform, and the instructor accepts of revises this plan.

Literature and material

Textbook

Tan, Steinbach, Kumar (2006). "Introduction to data mining", Pearson Education (at amazon.com). The course covers only chapters 6 ("Association analysis: basic concepts and algorithms") and 7 ("Association analysis: advanced concepts"). Sample chapters and slides are available at the textbook website.

Additional material

Top conferences and journals (for the enthusiast)