Data Mining

582634
5
Algoritmit ja koneoppiminen
Syventävät opinnot
This course focuses on concepts and methods for frequent pattern discovery, also known as association analysis. This edition of the course is a structured and guided self-study course with weekly tasks and supervision, with mandatory attendance. Prerequisites: BSc degree and the course Introduction to Machine Learning or equivalent. Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006.

Koe

03.05.2011 16.00 A111
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2011 kevät 14.03-28.04. 4-4 Englanti Hannu Toivonen

Luennot

Aika Huone Luennoija Päivämäärä
Ma 9-12 B222 Hannu Toivonen 14.03.2011-28.04.2011
To 9-12 B222 Hannu Toivonen 14.03.2011-28.04.2011

Ilmoittautuminen tälle kurssille alkaa tiistaina 22.2. klo 9.00.

Registration for this course starts on Tuesday 22nd of February at 9.00.

Information for international students

The course will be taught in English. The exam can be taken in English or Finnish. A Swedish exam is also available by advance request.

Yleistä

Results from the course are available in the department intranet at http://www.cs.helsinki.fi/i/htoivone/DM2011/results29042011.txt

Data mining or knowledge discovery ("tiedon louhinta" in Finnish) is the process of discovering interesting regularities in large masses of data. This course will focus on a fundamental and generic class of regularities, that of frequent patterns, also known as association analysis.

The course uses a problem-based approach where students learn by actively acquiring knowledge and skills, individually and in groups, to solve data mining challenges identified during the course. Participation in the course requires commitment and initiative, as well as regular and active attendance in the course meetings at Mon and Thu at 9-12. An alternative to course participation is to take the course by an exam, without participation in course meetings.

There are no separate lectures and exercises. The course meetings (Mon and Thu at 9-12) will mostly consist of discussions and team work. There will be a weekly cycle of the teacher presenting a problem and the students discussing and analysing it, identifying and setting their learning objectives, studying individually, and then presenting and discussing the learned content together. All of the above activities except invididual studying take place during the course meetings, and therefore participation in them is crucial.

In each cycle, the students will set their own learning objectives and then work to reach them. There will be few regular lectures by the teacher. Instead, the students will order short lectures on topics that they want to learn more or they had troubles understanding.

Prerequisites: BSc degree and the course Introduction to Machine Learning or equivalent.

 

Kurssin suorittaminen

The course can be taken either

  • A. by active participation in the course meetings and independent studies between them (see above), and reporting of this work as instructed during the course, OR
  • B. by a written exam.

Case A requires active attendance in at least 10 sessions and failing to deliver at most one acceptable learning journal and one acceptable group report (see Reporting). Grading is based on the journals and oral and written reporting. Otherwise case B is the only option. In case B, participation in sessions and delivered journals and reports do NOT count at all, only the exam does.

 

Kirjallisuus ja materiaali

The course is based on Chapters

  • 6.  Association Analysis: Basic Concepts and Algorithms and
  • 7.  Association Analysis: Advanced Concepts

of the book Introduction to Data Mining by Tan, Steinbach, and Kumar (Pearson Education, 2006). See the book website http://www-users.cs.umn.edu/~kumar/dmbook/index.php for material for and from the book. The final (or separate) exam will be based solely on the two chapters above.

Additional material that may be helpful when studying: