University homepage Suomenkielinen versio puuttuu Inte på svenska In english
University of Helsinki Department of Computer Science
 

Department of Computer Science

582444 Special Course on Data mining, 3 cu

29 Okt - 4 Dec 2003, Wed and Thu, 12-14
Lecture room A217, Teollisuuskatu 23, Vallila
http://www.cs.helsinki.fi/u/goethals/dmcourse

Lecturer: Bart Goethals
Assistant: Taneli Mielikäinen

Course description

  • This course covers an overview of pattern discovery in the field of data mining and knowledge discovery from databases (KDD) from both a theoretical and practical point of view.
    Topics will include several algorithms and techniques for the discovery of frequent itemsets, association rules, episodes, and integrity constraints.

  • Part of the course will consist of project work, which consists of implementing an algorithm (or two).

  • The course is lectured in english and all course material will also be in english.

Course material

  • course notes: "Knowledge Discovery in Databases: Search for Frequent Patterns"
  • copies of slides (see schedule)
  • original articles (on condensed representations)

Course schedule and content

Wed 29 Oct 03Course overview + Introduction to data mining and KDD (slides)
Thu 30 Oct 03Discovery of association rules + assignment 1+2+3 (slides)
Wed 5 Nov 03Algorithms for the discovery of association rules (slides)
Thu 6 Nov 03Algorithms for the discovery of association rules (slides)
Thu 13 Nov 03 Discovery of episodes + General framework + assignment 4 (slides)
Wed 26 Nov 03 Complexity issues (slides)
Wed 3 Dec 03Condensed representations (slides)
Thu 4 Dec 03Summary + questions
Fri 12 Dec 03Exam 16h-20h room 1 main building (deadline: project)
Fri 16 Jan 04Take home exam (deadline: 25 Jan, midnight)
Tue 20 Apr 04Take home exam (deadline: 27 Apr, midnight)

Assignments and Project

For the project, you have to write an efficient implementation of the 'Dynamic Itemset Counting' algorithm, as described in the course.
If you want to implement another algorithm, first check with Bart.
The main requirement is that it needs to be efficient.

Several benchmark datasets to test your implementations can be downloaded here.
Also some already implemented (C++) I/O classes can be downloaded here.
If you want to check whether the output of your algorithm is correct, you can download existing implementations of some frequent set mining algorithms here.

Office hours

  • Bart:
    • Wed, 14:30-15:00
    • Thu, 14:30-15:00
  • Taneli:
    • Mon, 10-12

IMPORTANT

A student who already has credits of course 581550 (Tietämyksen muodostaminen (3 ov)), dated later than 31.7.2002 (a course lectured by Hannu Toivonen) or before 2000 (lectured by Kilpeläinen or Mannila), can use the credits of only one or the other of courses 582444 and 581550.

(Likewise, a student who already has credits of course 581550 (Tietämyksen muodostaminen (3 ov)), dated between 1.1.2000 and 31.7.2002 (courses given by Helena Ahonen-Myka, Mika Klemettinen, and Pirjo Moen), can use the credits of only one or the other of courses 582448 and 581550.)