Project in Probabilistic Models

582637
2-3
Algoritmit ja koneoppiminen
Syventävät opinnot
The task in this course is to implement and empirically validate probabilistic modeling techniques on a real-world data analysis problem. The progress of each participant will be monitored weekly, and at the end the participants are also expected to summarize their results by submitting a project report and giving a short talk. Prerequisites: 582636 Probabilistic Models.

Koe

02.32
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2011 kevät 17.03-28.04. 4-4 Englanti Antti Honkela

Luennot

Aika Huone Luennoija Päivämäärä
To 16-18 C220 Petri Myllymäki 17.03.2011-28.04.2011

Ilmoittautuminen tälle kurssille alkaa tiistaina 22.2. klo 9.00.

Registration for this course starts on Tuesday 22nd of February at 9.00.

Information for international students

The course will be held in English.

Yleistä

The primary instructor of the course is Dr. Antti Honkela, Prof. Petri Myllymäki is the secondary instructor.

This course involves project work in probabilistic modeling. The task in this course is to implement and empirically validate probabilistic modeling techniques on a real-world data analysis problem. The progress of each participant will be monitored weekly, and at the end the participants are also expected to summarize their results by submitting a project report and giving a short talk.

This year, your task is to construct a programme that learns a Bayesian network model (structure) from a given set of discrete training data. There are two types of evaluation: accuracy of link predictions and accuracy of the predictive distribution. These will be compared to those of the (hidden) "golden standard" Bayesian network that was used for generating the training data. The golden standard solution is not given (until at the very end of the course), but each student is given each week a score describing how close his/her solutions are to the golden standard solution.

Please note that you need to have successfully passed the course 582636 Probabilistic Models before attending this course. If you attended the course this Spring and the decision is still pending, you may sign up for this project, but cannot participate if you fail to pass the basic course.

 

Kurssin suorittaminen

Schedule

 17 March Initial lecture 16:15-17:00, the task is published
 24 March Lecture (Q+A session) 
 31 March *NO LECTURE* 
 5 April First return DL
 7 April First feedback session
12 April Second return DL
14 April Second feedback session
19 April Third return DL
21 April No lecture (Easter)
26 April Final return DL
28 April Final session (16:15-20:00)

Course requirements 

Every student is required to

  • each week submit predictions as outlined below and a brief diary of the methods and progress, as well as participate in the weekly sessions where the progress is monitored
  • at the end of the course, give a brief talk describing the methodology used in the submitted solutions
  • submit a final report describing the progress during project, the methods used, the results and main observations done and the essential technical implementational aspects 
  • submit the sources of the programme code used

You may use publicly available software in your project, as long as it is freely available for academic use. Use of your own software will be considered positively in the marking. Also note that every student needs to submit an individual solution, this is not team work.

Evaluation of the predictions

Your results for the network structure learning will be evaluated using two criteria: link prediction accuracy and predictive distribution modelling accuracy.

For the first evaluation, you need to produce a ranked list of directed arcs in the network. These will be evaluated based on the area under the ROC curve.

For the second evaluation, you need to produce a predictive probability distribution for a given set of test data vectors. The produced distribution on the test data vectors is compared to the distribution produced by the gold standard model using a suitable distance metric.

 

Marking

Marking is based on the effort put to the task and insights obtained. The quality of the predictions in relation to other students is not a criterion. Careful analysis of the strengths and weaknesses of the chosen approach and use of own code will be considered positively.

Kirjallisuus ja materiaali

In the project you are expected to apply the skills learned on the Probabilistic Models course.

The data needed for the project is available in the course Moodle page.