Data Mining (guided self study)

582634
5
Algoritmit ja koneoppiminen
Syventävät opinnot
This course focuses on concepts and methods for frequent pattern discovery, also known as association analysis. This edition of the course is a structured and guided self-study course with weekly tasks and supervision, with mandatory attendance. Prerequisites: BSc degree and the course Introduction to Machine Learning or equivalent. Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2016 kevät 21.01-03.03. 3-3 Englanti Hannu Toivonen

Luennot

Aika Huone Luennoija Päivämäärä
To 10-12 B222 Hannu Toivonen 21.01.2016-03.03.2016

Harjoitusryhmät

Group: 1
Aika Huone Ohjaaja Päivämäärä Huomioitavaa
To 12-14 B221 Arto Vihavainen 21.01.2016—03.03.2016

Information for international students

The course will be given in English.

Yleistä

This course will familiarize the participants with concepts and methods for identifying interesting patterns from large datasets. Data mining is about trying to make sense of data, usually without clear questions or clear success criteria. The course will focus on discovery of frequent patters in data, a fundamental data mining task that can help extract knowledge and previously unknown patterns also from largely unstructured data.

For unofficial IRC guidance, a channel #dm2016 will be set up on IRCNet if needed.

Note that the lab/exercise sessions start already after the first lecture on Thu 21 Jan!

Kurssin suorittaminen

Albeit being a self-study course, the course will contain scheduled activities that are to be completed within a given time-frame. The course is completed by

  1. carrying out weekly individual assignments and keeping a learning journal,
  2. participating in group work where the groups determine research questions and infer knowledge from a larger data set, and
  3. studying.

The Thursday sessions 10-14 are mandatory and start on 21.1.

There are no traditional lectures per se, and as such the learning approach taken in the course is self and group study.

If you wish to take the course without participating in any of the activities, attend a separate exam. See http://www.cs.helsinki.fi/en/exams for the exam schedule.

Kirjallisuus ja materiaali

Course book: Tan P., Steinbach M. & Kumar V.: Introduction to Data Mining, Chapters 6 and 7. Addison Wesley, 2006. Links:

Provisional list of material covered (also in separate exams): Chapters 6 and 7 of Tan et al, except sections 6.2.4 (Support Counting), 6.3.2 (Rule Generation in Apriori Algorithm), 6.8 (Effect of Skewed Support Distribution), 7.5 (Subgraph Patterns), 7.6 (Infrequent Patterns).

Individual assignments

  • Set 1 -- due 28.1.2016 10AM
  • Set 2 -- due 4.2.2016 10AM
  • Set 3 -- due 11.2.2016 10AM
  • Set 4 -- due 18.2.2016 10 AM
  • Set 5 -- due 25.2.2016 10AM
  • Set 6 -- due 3.3.2016 10 AM

Weekly tests

Course guidelines:

Can be found here. Note! Additional details about the group work (esp. peer review) have been added.

Discussion and questions:

You can ask questions about the assignments in Piazza. The sign-up link is piazza.com/helsinki.fi/spring2016/582634 and the actual forum link is piazza.com/helsinki.fi/spring2016/582634/home