Data Mining Project

Algorithms and machine learning
Advanced studies
Application of data mining to a data analysis problem. The project covers the whole data mining process, and includes either implementing a data mining algorithm or using a wider range of available implementations. The project is completed by a research report describing and justifying the steps taken and decisions made, and discussing the results obtained. Prerequisites: The course Data Mining. The project can only be taken during the specified period. There are no final exams.
Year Semester Date Period Language In charge
2014 spring 05.05-16.05. 4-4 English Fabio Cunial


Time Room Lecturer Date
Tue 14-16 B222 Fabio Cunial 06.05.2014-06.05.2014
Mon 10-12 B222 Fabio Cunial 12.05.2014-12.05.2014
Fri 10-14 B222 Fabio Cunial 16.05.2014-16.05.2014

Ilmoittautuminen tälle kurssille alkaa tiistaina 18.2. klo 9.00.

Registration for this course starts on Tuesday 18th of February at 9.00.


The objectives of this project are:

  1. to get an exposure to advanced concepts or practices in itemset and association rule discovery;
  2. to understand where the field is currently going;
  3. to do something cool that you could write in your CV;
  4. to have fun :-)

Completing the course

The project can be completed in one of the following, mutually-exclusive strategies. Regardless of the strategy, the student must submit a detailed report of her activity.

  1. (Algorithms) Study one of the papers listed in section "Literature and material", and either:
    1. write a detailed summary on the paper, or
    2. implement the main idea described in the paper, or
    3. improve the theoretical results of the paper.
  2. (Implementations) Perform an in-depth review of the implementations that are currently available for itemset and association rule discovery. In particular, choose one of the options below:
    1. Review the whole state of the art. What is the architecture of such implementations? Do they support parallelism? How do they handle large datasets? Which implementation choices do they make? Which of them performs best on benchmark datasets? Collect and plot performance metrics.
    2. Study the fine details of one specific implementation. Answer the same questions as in point (2.1), but in greater depth. Read and possibly change the source code.
  3. (Datasets) Using the algorithms studied in the Data Mining course, and possibly interacting with a domain expert, design a controlled set of experiments to find semantically meaningful patterns from the course datasets. Perform a detailed analysis of the discovered patterns.
  4. (Applications) Design and implement an innovative application of the algorithms studied in the Data Mining course (for example a smartphone app, a facebook app, a gmail app, or a gcalendar app -- for possible inspiration, see e.g. this blog post, this facebook app, this smartphone app, and this example of app integration: can you do better?). The application must be agreed beforehand with the instructor, and it must have a well-defined purpose and a clear utility (but it can use existing algorithms and implementations). The student is expected to have prior working knowledge of the technologies required to implement the application.

Strategies (2), (3) and (4) allow students to form groups of at least two people and to submit a joint report.​