Data Mining Project (guided self study)

582635
2
Algoritmit ja koneoppiminen
Syventävät opinnot
Application of data mining to a data analysis problem. The project covers the whole data mining process, and includes either implementing a data mining algorithm or using a wider range of available implementations. The project is completed by a research report describing and justifying the steps taken and decisions made, and discussing the results obtained. Prerequisites: The course Data Mining. The project can only be taken during the specified period. There are no final exams.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2015 kevät 04.05-15.05. 4-4 Englanti Hannu Toivonen

Luennot

Aika Huone Luennoija Päivämäärä
Ma 10-12 B222 Hannu Toivonen 04.05.2015-04.05.2015

Ilmoittautuminen tälle kurssille alkaa tiistaina 17.2. klo 9.00.

Registration for this course starts on Tuesday 17th of February at 9.00.

Yleistä

The tasks of this project are:

  1. identify a dataset that you would like to work on 
  2. extend your data mining knowledge
  3. implement your own (efficient) frequent pattern mining algorithms that are specifically tailored for your data
  4. find interesting or/and meaningful frequent patterns
  5. write a good report that describes your results

Possible datasets:

  1. The dataset used in the course
  2. KDD cup datasets, see http://www.sigkdd.org/kddcup/index.php
  3. NYC Taxi data, see e.g. http://www.andresmh.com/nyctaxitrips/
  4. Movie Lens dataset with movie ratings, see e.g. http://grouplens.org/datasets/movielens/
  5. Election data, see e.g. http://www.globalelectionsdatabase.com/ -- some data also in Finnish (for the Finnish data, you need to dig up the election results from elsewhere..)
  6. Your own data!

What you should do before the end of 5th of May:

  1. Identify a dataset that you would like to work on, sketch a few notes on what the patterns in the data could be like
  2. Think of the approaches that you will be using for mining the data
  3. Send a note to Arto (avihavai@cs.helsinki,fi) outlining your data, patterns and the approach. Include also a working title for your project.

Project deadline:

Depends on you, but have the project done at the latest by the end of May.

Grading:

The project will be graded fail / 3 / 5. See the "Main themes and learning objectives" on the left hand side for an outline of what is expected from you. For five, you need to "Produce genuinely interesting or meaningful results" or "Develops a new pattern type and its implementation, designs a very generic algorithm and its implementation, or makes a very efficient implementation". To get a three, you need to achieve the topics in the "Saavuttaa oppimistavoitteet"-box.

Report format and structure:

Use the CS department latex template at https://github.com/UniversityHelsinkiTKTL/tktltiki2 for your report. The report length should be 10 pages with references and possible figures. When writing the report, make sure to include the following:

  1. Abstract
  2. Introduction
  3. Related work
  4. Methodology
  5. Data
  6. Results
  7. Discussion
  8. Conclusions
  9. Ideas for future work

Additional literature:

See the project page from year 2014 for links to literature and possible approaches that you can take.

Meetings:

Will be organized on demand -- if you need feedback, contact Arto (avihavai@cs.helsinki.fi) -- note that the meetings listed above are not valid.