Data Mining Project

582635
2
Algoritmit ja koneoppiminen
Syventävät opinnot
Application of data mining to a data analysis problem. The project covers the whole data mining process, and includes either implementing a data mining algorithm or using a wider range of available implementations. The project is completed by a research report describing and justifying the steps taken and decisions made, and discussing the results obtained. Prerequisites: The course Data Mining. The project can only be taken during the specified period. There are no final exams.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2012 kevät 07.05-18.05. 4-4 Englanti Hannu Toivonen

Luennot

Aika Huone Luennoija Päivämäärä
Ma 10-12 B222 Hannu Toivonen 07.05.2012-07.05.2012
Ma 10-12 B222 Hannu Toivonen 14.05.2012-14.05.2012
Pe 10-14 B222 Hannu Toivonen 18.05.2012-18.05.2012

Ilmoittautuminen tälle kurssille alkaa tiistaina 21.2. klo 9.00.

Registration for this course starts on Tuesday 21st of February at 9.00.

Kurssin suorittaminen

The closing session has been rescheduled to Monday 21.05.2012, 14:00 - 16:00.

The first meeting will take place on Monday 07.05.2012, 10:00 - 12:00. Where we will discuss the organisation of the project, possible topic choice and group composition and agree on intermediate meeting and final presentation session.

The aim of the data mining project is to apply the concepts and methods learnt during the data mining course to real-world datasets. In addition to the starting and closing sessions, intermediate meetings will be organised to discuss the advancement of the projects.

The duration of the project is short, therefore it is intended to be rather intensive and the students are expected to start working on the problem of their choice with no delays.

If students are interested, they might continue working with the course data used during some of the problems. Alternatively, student might select a task from this year's or past year's KDD cups (see http://www.sigkdd.org/kddcup/).

The students are strongly encouraged to suggest problems of their own.

 

Oral presentation:

Each sudent/team will present his work during the closing session. Students may use slides (to be submitted with the final report) for the presentation and wil have up to 15 minutes to present their work, including questions. The presentation should be kept at an appropriate level of details, in particular a clear outline of the implementation should be given but very technical programming points should be left out, also, an overview of obtained results should be given, not only focusing on a couple of patterns, although you can of course present some examples in more details.

You should try to make clear:

what your problem is, how you propose to answer it, i.e., what kind of patterns you propose to look for in the data, how they should answer your question.

how  your implementation actually enumerates these patterns, how  you make sure it does not miss any and is efficient.

what kind of pattern you found, explain how and why you can or cannot use them to answer the original problem and how you could improve on these results

 

Written report:

Students should submit a short report (circa 10 pages) presenting their work in a clear and concise way,

  1. Formulate your problem,
  2. Explain and motivate your proposed solution,
  3. Describe shortly your implementation,
  4. Present your results, what kind of pattern did you found, is it helpful to solve the original problem, why?
  5. Report on work organisation (team work repartition where applicable) and difficulties faced.

The first three points are quite similar to the expected content of the oral presentation. The last one does not need to be addressed during the presentation.
The report should be submitted as a pdf and should indicate your name.

Implementation code should be submitted along with the report, and should be easy to try on a data sample (include basic instructions on how to use the implementation, the command to run it, in particular).

Submission can made by email to the assistant, The deadline will be agreed upon during the first session.