Data Mining Project

582635
2
Algoritmit ja koneoppiminen
Syventävät opinnot
Application of data mining to a data analysis problem. The project covers the whole data mining process, and includes either implementing a data mining algorithm or using a wider range of available implementations. The project is completed by a research report describing and justifying the steps taken and decisions made, and discussing the results obtained. Prerequisites: The course Data Mining. The project can only be taken during the specified period. There are no final exams.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2014 kevät 05.05-16.05. 4-4 Englanti Fabio Cunial

Luennot

Aika Huone Luennoija Päivämäärä
Ti 14-16 B222 Fabio Cunial 06.05.2014-06.05.2014
Ma 10-12 B222 Fabio Cunial 12.05.2014-12.05.2014
Pe 10-14 B222 Fabio Cunial 16.05.2014-16.05.2014

Ilmoittautuminen tälle kurssille alkaa tiistaina 18.2. klo 9.00.

Registration for this course starts on Tuesday 18th of February at 9.00.

Yleistä

The objectives of this project are:

  1. to get an exposure to advanced concepts or practices in itemset and association rule discovery;
  2. to understand where the field is currently going;
  3. to do something cool that you could write in your CV;
  4. to have fun :-)

Kurssin suorittaminen

The project can be completed in one of the following, mutually-exclusive strategies. Regardless of the strategy, the student must submit a detailed report of her activity.

  1. (Algorithms) Study one of the papers listed in section "Literature and material", and either:
    1. write a detailed summary on the paper, or
    2. implement the main idea described in the paper, or
    3. improve the theoretical results of the paper.
  2. (Implementations) Perform an in-depth review of the implementations that are currently available for itemset and association rule discovery. In particular, choose one of the options below:
    1. Review the whole state of the art. What is the architecture of such implementations? Do they support parallelism? How do they handle large datasets? Which implementation choices do they make? Which of them performs best on benchmark datasets? Collect and plot performance metrics.
    2. Study the fine details of one specific implementation. Answer the same questions as in point (2.1), but in greater depth. Read and possibly change the source code.
  3. (Datasets) Using the algorithms studied in the Data Mining course, and possibly interacting with a domain expert, design a controlled set of experiments to find semantically meaningful patterns from the course datasets. Perform a detailed analysis of the discovered patterns.
  4. (Applications) Design and implement an innovative application of the algorithms studied in the Data Mining course (for example a smartphone app, a facebook app, a gmail app, or a gcalendar app -- for possible inspiration, see e.g. this blog post, this facebook app, this smartphone app, and this example of app integration: can you do better?). The application must be agreed beforehand with the instructor, and it must have a well-defined purpose and a clear utility (but it can use existing algorithms and implementations). The student is expected to have prior working knowledge of the technologies required to implement the application.

Strategies (2), (3) and (4) allow students to form groups of at least two people and to submit a joint report.​