582634 Data Mining (4 cu)
Course description
In data mining large data sets are studied for the purpose of finding new, interesting and useful information. The course provides an overview of the data mining process in various stages of typical data mining tasks and methods used. Emphasis is on methods for the discovery of frequent patterns and randomization methods for results validation.
9.6.2010. The separate exam 4.6.2010 has been graded. The results are here.
12.5.2010. The course has been graded. The results sheet is available here. The results will show in Weboodi in a few days.
Checklist for exercise, group work and paper points. Status as of 6.5.2010 at 14:45. Link only works from within cs.helsinki.fi and hiit.fi. Please contact Taru if you think your points are not down correctly.
Please give feedback about the course!
Note! The last group work session is held normal time Tue 27.4 10-12 but the debrief session will be held on Wed 28.4. at 9.00am in room C222Prerequisites: basics of machine learning, knowledge of algorithms and data structures, programming skills
Teaching
Lectures: Prof. Juho Rousu (juho.rousu (ät) cs.helsinki.fi) Lecture times: 15.03.-30.04. Monday 12-14, Tuesday 10-12 Lecture Room: B222
Exercise sessions: 22.03.- 30.04.: Taru Itäpelto (itapelto (ät) cs.helsinki.fi), Tuesdays 12-14, B222
Course Exam
Tuesday 4.5. at 9-12, Lecture hall B123
Completing the course
The course consists of the following components:
- Lectures
- Group work: completed during the group works session, presented at the exercise session, 15% of the grade
- Exercises: completed at home, reviewed in the exercise sessions, 15% of the grade
- Paper work: reading and writing summaries of scientific papers, 15% of the grade
- Course exam, 55% of the grade. Examined content are the lectures and the exercises. Group work and papers are not part of the examined contents.
Schedule
Lecture slides
The lectures will mostly follow the book "Introduction to Data Mining" by Tan et al. (see below). Only part of the book will be covered and some additional material will be used.- Lecture 1 (15.3)
- Lecture 2 (16.3)
- Lecture 3 (22.3)
- Lecture 4 (29.3)
- Lecture 5 (30.3)
- Lecture 6 (12.4)
- Lecture 7 (19.4)
- Lecture 8 (20.4)
- Lecture 9 (26.4)
Exercises & Group work
- Groupwork 1 (23.3)
- Exercises 1 (30.3) Solutions
- Groupwork 2 (13.4)
- Exercises 2 (20.4) Solutions Note! Some mistakes corrected from the solutions. Please download the corrected ones
- Groupwork 3 (27.4), Note! The group work session is held normal time Tue 10-12, room B222 being available until 14.00. The debrief session will be held on Wed 28.4. at 9.00am in room C222
Papers
Links to the scientific papers to be summarized will appear here. A summary of 2-4 pages, gathering the main contents of a scientific article and rephrasing it in your own words, is to written of each of the papers. The summaries should have the format of a scientific paper with title, author information (you), an abstract, section titles and references.Summary is to be returned as a PDF file, via email to Taru (itapelto (at) cs.helsinki.fi) by the given deadline. Each paper will be graded on a scale of 1-5. Late submissions will be automatically graded down.
Note! The following links will work only from inside cs.helsinki.fi and hiit.fi domains.- Paper 1 (deadline Mon March 22 at 23:59)
- Paper 2 (deadline Mon April 12 at 23:59)
- Paper 3 (deadline Mon April 26 at 23:59)
Literature
- Tan, Steinbach, Kumar, Introduction to Data Mining
- Han, Kamber, Data Mining: Concepts and Techniques.
- Mannila, Toivonen, Goethals, Knowledge Discovery in Databases: Search for Frequent Patterns.
- Data mining vocabulary in Finnish Tiedon louhinnan sanasto