Helsingin yliopisto Tietojenkäsittelytieteen laitos
 

Tietojenkäsittelytieteen laitos

Tietoa laitoksesta:

 

582634 Data Mining (4 cu)

Course description

In data mining large data sets are studied for the purpose of finding new, interesting and useful information. The course provides an overview of the data mining process in various stages of typical data mining tasks and methods used. Emphasis is on methods for the discovery of frequent patterns and randomization methods for results validation.

9.6.2010. The separate exam 4.6.2010 has been graded. The results are here.

12.5.2010. The course has been graded. The results sheet is available here. The results will show in Weboodi in a few days.

Checklist for exercise, group work and paper points. Status as of 6.5.2010 at 14:45. Link only works from within cs.helsinki.fi and hiit.fi. Please contact Taru if you think your points are not down correctly.

Please give feedback about the course!

Note! The last group work session is held normal time Tue 27.4 10-12 but the debrief session will be held on Wed 28.4. at 9.00am in room C222

Prerequisites: basics of machine learning, knowledge of algorithms and data structures, programming skills

Teaching

Lectures: Prof. Juho Rousu (juho.rousu (ät) cs.helsinki.fi) Lecture times: 15.03.-30.04. Monday 12-14, Tuesday 10-12 Lecture Room: B222

Exercise sessions: 22.03.- 30.04.: Taru Itäpelto (itapelto (ät) cs.helsinki.fi), Tuesdays 12-14, B222

Course Exam

Tuesday 4.5. at 9-12, Lecture hall B123

Completing the course

The course consists of the following components:

  • Lectures
  • Group work: completed during the group works session, presented at the exercise session, 15% of the grade
  • Exercises: completed at home, reviewed in the exercise sessions, 15% of the grade
  • Paper work: reading and writing summaries of scientific papers, 15% of the grade
  • Course exam, 55% of the grade. Examined content are the lectures and the exercises. Group work and papers are not part of the examined contents.

The course will be graded in the scale 1-5. 50% of the maximum points will give the grade of 1/5, 80% of the maximum will give the grade of 5/5.

Schedule

Lecture slides

The lectures will mostly follow the book "Introduction to Data Mining" by Tan et al. (see below). Only part of the book will be covered and some additional material will be used.

Exercises & Group work

Papers

Links to the scientific papers to be summarized will appear here. A summary of 2-4 pages, gathering the main contents of a scientific article and rephrasing it in your own words, is to written of each of the papers. The summaries should have the format of a scientific paper with title, author information (you), an abstract, section titles and references.

Summary is to be returned as a PDF file, via email to Taru (itapelto (at) cs.helsinki.fi) by the given deadline. Each paper will be graded on a scale of 1-5. Late submissions will be automatically graded down.

Note! The following links will work only from inside cs.helsinki.fi and hiit.fi domains.
  1. Paper 1 (deadline Mon March 22 at 23:59)
  2. Paper 2 (deadline Mon April 12 at 23:59)
  3. Paper 3 (deadline Mon April 26 at 23:59)

Literature