Data Mining Project
|Wed 10-12||B222||Hannu Toivonen||04.05.2011-04.05.2011|
|Fri 10-12||B222||Hannu Toivonen||20.05.2011-20.05.2011|
Ilmoittautuminen tälle kurssille alkaa tiistaina 22.2. klo 9.00.
Registration for this course starts on Tuesday 22nd of February at 9.00.
Please notice that the first session has been moved to Wed. 4.5 10-12.
The aim of the data mining project is to apply the concepts and methods learnt during the data mining course to real-world datasets. In addition to the starting and closing sessions, intermediate meetings will be organised to discuss the advancement of the projects.
The duration of the project is short, therefore it is intended to be rather intensive and the students are expected to start working on the problem of their choice with no delays.
If students are interested, they might continue working with the course data used during some of the problems. Alternatively, student might select a task from this year's or past year's KDD cups (see http://www.sigkdd.org/kddcup/).
The students are strongly encouraged to suggest problems of their own.
Each sudent/team will present his work during the closing session (Friday 20.5, 10:00 -12:00). Students may use slides (to be submitted with the final report) for the presentation and wil have up to 15 minutes to present their work, including questions. The presentation should be kept at an appropriate level of details, in particular a clear outline of the implementation should be given but very technical programming points should be left out, also, an overview of obtained results should be given, not only focusing on a couple of patterns, although you can of course present some examples in more details.
You should try to make clear:
what your problem is, how you propose to answer it, i.e., what kind of patterns you propose to look for in the data, how they should answer your question.
how your implementation actually enumerates these patterns, how you make sure it does not miss any and is efficient.
what kind of pattern you found, explain how and why you can or cannot use them to answer the original problem and how you could improve on these results
Students should submit a short report (circa 10 pages) presenting their work in a clear and concise way,
- Formulate your problem,
- Explain and motivate your proposed solution,
- Describe shortly your implementation,
- Present your results, what kind of pattern did you found, is it helpful to solve the original problem, why?
- Report on twork organisation (team work repartition where applicable) and difficulties faced.
The first three points are quite similar to the expected content of the oral presentation. The last one does not need to be addressed during the presentation.
The report should be submitted as a pdf and should indicate your name.
Implementation code should be submitted along with the report, and should be easy to try on a data sample (include basic instructions on how to use the implementation, the command to run it, in particular).
Submission can made by email to the assistant until Friday 20.05 23:59.