University of Helsinki homepageSuomeksiPå svenskaIn English
University of Helsinki Department of Computer Science
 

Department of Computer Science

582637 Project in probabilistic models (2 cr), Todennäköisyysmallien harjoitustyö (2 op) Spring 2009

Language used in the course will be Finnish or English, depending on the audience.

Sessions

12.03.-23.04.: Thu 16-18 in B222

Course instructor: Prof. Petri Myllymäki

Teaching assistant: Mika Urtela

Introduction

This is a new course belonging to the new Algorithms and Machine Learning sub-pogramme in the Master's programme of the department, and together with 582636 Probabilistic models (4 cr), it forms one of the three optional courses of the sub-programme.

For students in the old Intelligent Systems specialisation area: this course replaces, together with the course 582636 Todennäköisyysmallit (4 cr), the course Three Concepts: Probability (6 cr).

Prerequisites

The students are expected to do the course 582636 Probabilistic models (4 cr) first before taking this course. In exceptional cases (e.g. if you cannot do that course because you do not understand Finnish), please contact the course instructor.

Course Description

This course involves project work in probabilistic modeling. This year, your task is to compete in missing data completion: the winner of the competition is the student who by the end of the course has been able to correctly guess more missing entries than any other competitor. More precisely, in the beginning you are given a matrix where some of the entries are missing. The competition consists of three rounds, and after each round, you are expected to give your guesses for the missing entries, and we will tell you (but not your competitors) which of your guesses were correct. The data is not binary, so making good guesses already on round 1 is useful as you will know the correct value only for those entries which you guessed right, and you can use this information on round 2 (and 3).

The data remains the same throuhgout the competition, and the winner is the participant who after three rounds has correctly filled the largest part of the missing entires. After the three rounds, you ar expected to write a final report of your work, and give a presentation at the final seminar. We will also reveal the source of the data at the seminar.

The grade of the course depends on the following factors:

  • Your success in the competition (measured with respect to other competitors)
  • The technical quality of the approaches you try during the competition. As this is a project on probabilistic models, we appreciate only solutions based on probabilistic modeling. You can explore alternative approaches as well if you wish, but be warned that in the unlikely event that you win the competition with a non-probabilistic approach, and in your report you describe no attempts with probabilistic models, you will not get maximum points.
  • The innovativeness and range of your work. We value good imagination and hard work, so if you have good ideas, and take the time to explore many of them, this will earn you more points even if the results for some reason are not good as you expected.
  • The quality of the final report. In your report, try to bring out the two aspects above and describe also your failures. Extra points if you can analyze the reasons for your potential failures.
  • Quality of your presentation at the final seminar. You are expected to prepare slides for the presentation. Extra points for additional material, like animations etc.

Course schedule

Thu 12.03.2009, 16-18: First meeting, administrative issues, introduction of the course
Thu 19.03.2009, 16-17: Questions and answers, progress monitoring.
Wed 25.03.2009, 9:00: Deadline for Round 1 submissions.
Thu 26.03.2009, 16-18: Results of Round 1.
Wed 01.04.2009, 9:00: Deadline for Round 2 submissions.
Thu 02.04.2009, 16-18: Results of Round 2.
Wed 08.04.2009, 9:00: Deadline for Round 3 submissions.
Thu 09.04.2009, 16-18: Results of Round 3.
Thu 16.04.2009, 16-18: No session.
Wed 22.04.2009, 9:00: Deadline for the final report.
Thu 23.04.2009, 16-18: Final seminar.

Formats

The data is given as a single ASCII file with tab-delimited entries. The number of variables (columns) is 31, and the number of onservation vectors (rows) is 2000. The missing entries are denoted by an asterisk (*), they are 10000 in number. Your submission should be the same file with the asterisks replaced by the guessed values. Name the file as "yourlastname_roundX.txt", where X is the number of the round (i.e., 1, 2 or 3). Before the deadline of each round, deliver the file by email (as a separate attachment) to both Petri and Mika, with subject of the email being the name of the attached file.

The data can be found in http://www.cs.helsinki.fi/group/cosco/Teaching/Probability/2009/Project/incomplete_data.txt.

The final report should be a single PDF file, delivered by email to both Petri and Mika. Please attach also copies of your final seminar slides (as PDF) and your program source code as a separate file. The language used in the final report can be either Finnish or English.

Delivery addresses: petri.myllymaki and mika.urtela, both at cs.helsinki.fi.

Course material

See the material of the course Probabilistic models.


Petri Myllymäki