Data Mining Project

582635
2
Algorithms and machine learning
Advanced studies
Application of data mining to a data analysis problem. The project covers the whole data mining process, and includes either implementing a data mining algorithm or using a wider range of available implementations. The project is completed by a research report describing and justifying the steps taken and decisions made, and discussing the results obtained. Prerequisites: The course Data Mining. The project can only be taken during the specified period. There are no final exams.
Year Semester Date Period Language In charge
2013 spring 06.05-17.05. 4-4 English Hannu Toivonen

Lectures

Time Room Lecturer Date
Mon 10-12 B222 Hannu Toivonen 06.05.2013-06.05.2013
Mon 10-12 B222 Hannu Toivonen 13.05.2013-13.05.2013
Fri 10-14 B222 Hannu Toivonen 17.05.2013-17.05.2013

Ilmoittautuminen tälle kurssille alkaa tiistaina 19.2. klo 9.00.

Registration for this course starts on Tuesday 19th of February at 9.00.

General

Please send your statement to the course assistant: galbrun at cs.helsinki.fi, specifying your dataset, the kind of pattern you are looking for (sequence mining, subgroup dicovery or graph mining, etc.) and method.

The final meeting will most likely take place on Monday, May 20th. Unlike indicated on the schedule above. More details will be sent in reply to the statement email.

 

The first meeting will take place on Monday 06.05.2012, 10:00 - 12:00, in room B222. Where we will discuss the organisation of the project, possible topic choice and group composition and agree on intermediate meeting and final presentation session.

The aim of the data mining project is to apply the concepts and methods learnt during the data mining course to real-world datasets, and possibly to learn more advanced data mining techniques. In addition to the starting and closing sessions, intermediate meetings will be organised to discuss the advancement of the projects.

The duration of the project is short, therefore it is intended to be rather intensive and the students are expected to start working on the problem of their choice with no delays.

Participants are expected to implement a solution of their own.

Students who have a interesting data mining question related to their own research area or personal interests, they are very welcome to suggest it as a project topic. Else, if students are interested, they might continue working with the course data used during some of the problems. Alternatively, students might use  datasets and adapt problems suggested online, from previous KDD cup for instance (see https://www.kaggle.com/competitions or http://www.kdnuggets.com/datasets/). In any case, the suitability of the problem should be agreed upon before the team sets on to the work...

The students are strongly encouraged to suggest problems of their own.

Oral presentation:

Each sudent/team will present his work during the closing session. Students may use slides (to be submitted with the final report) for the presentation and will have up to 15 minutes to present their work, including questions. The presentation should be kept at an appropriate level of details, in particular a clear outline of the implementation should be given but very technical programming points should be left out, also, an overview of obtained results should be given, not only focusing on a couple of patterns, although you can of course present some examples in more details.

You should try to make clear:

what your problem is, how you propose to answer it, i.e., what kind of patterns you propose to look for in the data, how they should answer your question.

how  your implementation actually enumerates these patterns, how  you make sure it does not miss any and is efficient.

what kind of pattern you found, explain how and why you can or cannot use them to answer the original problem and how you could improve on these results

 

Written report:

Students should submit a short report (circa 10 pages) presenting their work in a clear and concise way,

  1. Formulate your problem,
  2. Explain and motivate your proposed solution,
  3. Describe shortly your implementation,
  4. Present your results, what kind of pattern did you found, is it helpful to solve the original problem, why?
  5. Report on work organisation (team work repartition where applicable) and difficulties faced.

The first three points are quite similar to the expected content of the oral presentation. The last one does not need to be addressed during the presentation.
The report should be submitted as a pdf and should indicate your name.

Implementation code should be submitted along with the report, and should be easy to try on a data sample (include basic instructions on how to use the implementation, the command to run it, in particular).

Submission can made by email to the assistant, The deadline will be agreed upon during the first session.