Data Mining Project (guided self study)
|Mon 10-12||B222||Hannu Toivonen||04.05.2015-04.05.2015|
Ilmoittautuminen tälle kurssille alkaa tiistaina 17.2. klo 9.00.
Registration for this course starts on Tuesday 17th of February at 9.00.
The tasks of this project are:
- identify a dataset that you would like to work on
- extend your data mining knowledge
- implement your own (efficient) frequent pattern mining algorithms that are specifically tailored for your data
- find interesting or/and meaningful frequent patterns
- write a good report that describes your results
- The dataset used in the course
- KDD cup datasets, see http://www.sigkdd.org/kddcup/index.php
- NYC Taxi data, see e.g. http://www.andresmh.com/nyctaxitrips/
- Movie Lens dataset with movie ratings, see e.g. http://grouplens.org/datasets/movielens/
- Election data, see e.g. http://www.globalelectionsdatabase.com/ -- some data also in Finnish (for the Finnish data, you need to dig up the election results from elsewhere..)
- Your own data!
What you should do before the end of 5th of May:
- Identify a dataset that you would like to work on, sketch a few notes on what the patterns in the data could be like
- Think of the approaches that you will be using for mining the data
- Send a note to Arto (email@example.com,fi) outlining your data, patterns and the approach. Include also a working title for your project.
Depends on you, but have the project done at the latest by the end of May.
The project will be graded fail / 3 / 5. See the "Main themes and learning objectives" on the left hand side for an outline of what is expected from you. For five, you need to "Produce genuinely interesting or meaningful results" or "Develops a new pattern type and its implementation, designs a very generic algorithm and its implementation, or makes a very efficient implementation". To get a three, you need to achieve the topics in the "Saavuttaa oppimistavoitteet"-box.
Report format and structure:
Use the CS department latex template at https://github.com/UniversityHelsinkiTKTL/tktltiki2 for your report. The report length should be 10 pages with references and possible figures. When writing the report, make sure to include the following:
- Related work
- Ideas for future work
See the project page from year 2014 for links to literature and possible approaches that you can take.
Will be organized on demand -- if you need feedback, contact Arto (firstname.lastname@example.org) -- note that the meetings listed above are not valid.