Data Mining Project

Algorithms and machine learning
Advanced studies
Application of data mining to a data analysis problem. The project covers the whole data mining process, and includes either implementing a data mining algorithm or using a wider range of available implementations. The project is completed by a research report describing and justifying the steps taken and decisions made, and discussing the results obtained. Prerequisites: The course Data Mining. The project can only be taken during the specified period. There are no final exams.


Year Semester Date Period Language In charge
2011 spring 04.05-20.05. 4-4 English Hannu Toivonen


Time Room Lecturer Date
Wed 10-12 B222 Hannu Toivonen 04.05.2011-04.05.2011
Fri 10-12 B222 Hannu Toivonen 20.05.2011-20.05.2011

Ilmoittautuminen tälle kurssille alkaa tiistaina 22.2. klo 9.00.

Registration for this course starts on Tuesday 22nd of February at 9.00.


Please notice that the first session has been moved to Wed. 4.5 10-12.


The aim of the data mining project is to apply the concepts and methods learnt during the data mining course to real-world datasets. In addition to the starting and closing sessions, intermediate meetings will be organised to discuss the advancement of the projects.

The duration of the project is short, therefore it is intended to be rather intensive and the students are expected to start working on the problem of their choice with no delays.

If students are interested, they might continue working with the course data used during some of the problems. Alternatively, student might select a task from this year's or past year's KDD cups (see

The students are strongly encouraged to suggest problems of their own.


Oral presentation:

Each sudent/team will present his work during the closing session (Friday 20.5, 10:00 -12:00). Students may use slides (to be submitted with the final report) for the presentation and wil have up to 15 minutes to present their work, including questions. The presentation should be kept at an appropriate level of details, in particular a clear outline of the implementation should be given but very technical programming points should be left out, also, an overview of obtained results should be given, not only focusing on a couple of patterns, although you can of course present some examples in more details.

You should try to make clear:

what your problem is, how you propose to answer it, i.e., what kind of patterns you propose to look for in the data, how they should answer your question.

how  your implementation actually enumerates these patterns, how  you make sure it does not miss any and is efficient.

what kind of pattern you found, explain how and why you can or cannot use them to answer the original problem and how you could improve on these results


Written report:

Students should submit a short report (circa 10 pages) presenting their work in a clear and concise way,

  1. Formulate your problem,
  2. Explain and motivate your proposed solution,
  3. Describe shortly your implementation,
  4. Present your results, what kind of pattern did you found, is it helpful to solve the original problem, why?
  5. Report on twork organisation (team work repartition where applicable) and difficulties faced.

The first three points are quite similar to the expected content of the oral presentation. The last one does not need to be addressed during the presentation.
The report should be submitted as a pdf and should indicate your name.

Implementation code should be submitted along with the report, and should be easy to try on a data sample (include basic instructions on how to use the implementation, the command to run it, in particular).

Submission can made by email to the assistant until Friday 20.05 23:59.