Data Mining

582634
5
Algorithms and machine learning
Advanced studies
This course focuses on concepts and methods for frequent pattern discovery, also known as association analysis. This edition of the course is a structured and guided self-study course with weekly tasks and supervision, with mandatory attendance. Prerequisites: BSc degree and the course Introduction to Machine Learning or equivalent. Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006.
Year Semester Date Period Language In charge
2013 spring 11.03-25.04. 4-4 English Hannu Toivonen

Lectures

Time Room Lecturer Date
Mon 12-15 D122 Hannu Toivonen 11.03.2013-25.04.2013
Thu 12-15 D122 Hannu Toivonen 11.03.2013-25.04.2013

Time slots reserved for the lectures will also be used for exercises. Taking the course requires active participation in all time slots.

Information for international students

The course will be taught in English. Much of reporting and oral examinations will take place in groups, in English. (Alternatively, the course can be taken by an exam, in English, Finnish, or Swedish.)

General

Attendance at the first lecture is absolutely obligatory. Students cannot join the course after the first lecture.

Data mining or knowledge discovery ("tiedon louhinta" in Finnish) is the process of discovering interesting regularities in large masses of data. This course will focus on a fundamental and generic class of regularities, that of frequent patterns, also known as association analysis.

The course uses a problem-based approach where students learn by actively acquiring knowledge and skills, individually and in groups, to solve data mining challenges identified during the course. Participation in the course requires commitment and initiative, as well as regular and active attendance in the course meetings at Mon and Thu at 12-15. An alternative to course participation is to take the course by an exam, without participation in course meetings. (See below for more information.)

There are no separate lectures and exercises. The course meetings (Mon and Thu at 12-15) will mostly consist of discussions and team work. There will be a number of cycles of the teacher presenting a problem and the students discussing and analysing it, identifying and setting their learning objectives, studying individually, and then presenting and discussing the learned content together. All of the above activities except invididual studying take place during the course meetings, and therefore participation in them is crucial.

In each cycle, the students will set their own learning objectives and then work to reach them. There will be few regular lectures by the teacher. Instead, the students will be able to order short lectures on topics that they want to learn more or they had troubles understanding.

Prerequisites: BSc degree and the course Introduction to Machine Learning or equivalent.

 

Completing the course

The course can be taken either

  • A. by active participation in the course meetings and independent studies between them (see above), and reporting of this work as instructed during the course, OR
  • B. by taking a separate exam.

Mixing these two options is not possible. Either you take the course by active participation, or you take the exam. Either option may include an oral examination.

Case A requires active attendance and reporting throughout the whole course (see Reporting). Grading is based on the reporting and exam. Otherwise case B is the only option. In case B, activity and reports do NOT count at all, only the exam does.

To participate in the Data Mining project later this spring, the students must have taken the course (or passed the exam). It is not possible to participate in the project without this course or exam.

 

Literature and material

The course is based on Chapters

  • 6.  Association Analysis: Basic Concepts and Algorithms
  • 7.  Association Analysis: Advanced Concepts

of the book Introduction to Data Mining by Tan, Steinbach, and Kumar (Pearson Education, 2006). See the book website http://www-users.cs.umn.edu/~kumar/dmbook/index.php for material for and from the book. The final (or separate) exam will be based solely on the two chapters above.

Additional material that may be helpful when studying: