Project in Probabilistic Models

582637
2-3
Algorithms and machine learning
Advanced studies
The task in this course is to implement and empirically validate probabilistic modeling techniques on a real-world data analysis problem. The progress of each participant will be monitored weekly, and at the end the participants are also expected to summarize their results by submitting a project report and giving a short talk. Prerequisites: 582636 Probabilistic Models.
Year Semester Date Period Language In charge
2012 spring 13.03-24.04. 4-4 English Antti Honkela

Lectures

Time Room Lecturer Date
Tue 10-12 C220 Antti Honkela 13.03.2012-24.04.2012

Ilmoittautuminen tälle kurssille alkaa tiistaina 21.2. klo 9.00.

Registration for this course starts on Tuesday 21st of February at 9.00.

Information for international students

 The course will be held in English.

General

The course instructor is Dr Antti Honkela.

This course involves project work in probabilistic modeling. The task in this course is to implement and empirically validate probabilistic modeling techniques on a real-world data analysis problem. The progress of each participant will be monitored weekly, and at the end the participants are also expected to summarize their results by submitting a project report and giving a short talk.

This year, your task is to construct a programme that learns a Bayesian network model (structure) from a given set of discrete training data. There are two types of evaluation: accuracy of link predictions and accuracy of the predictive distribution. These will be compared to those of the (hidden) "golden standard" Bayesian network that was used for generating the training data. The golden standard solution is not given (until at the very end of the course), but each student is given each week a score describing how close his/her solutions are to the golden standard solution.

Please note that you need to have successfully passed the course 582636 Probabilistic Models before attending this course. If you attended the course this Spring and the decision is still pending, you may sign up for this project, but cannot participate if you fail to pass the basic course.

Completing the course

Schedule

13 March Initial lecture
20 March Lecture (Q+A session)
25 March First return DL
27 March First feedback session
1 April Second return DL
3 April Second feedback session
10 April No lecture (Easter)
15 April Third return DL
17 April Third feedback session
22 April Final return DL
24 April Final session

 

Course requirements 

Every student is required to

  • each week submit predictions as outlined below and a brief diary of the methods and progress, as well as participate in the weekly sessions where the progress is monitored
  • at the end of the course, give a brief talk describing the methodology used in the submitted solutions
  • submit a final report describing the progress during project, the methods used, the results and main observations done and the essential technical implementational aspects 
  • submit the sources of the programme code used

You may use publicly available software in your project, as long as it is freely available for academic use. Use of your own software will be considered positively in the marking. Also note that every student needs to submit an individual solution, this is not team work. 

Evaluation of the predictions

Your results for the network structure learning will be evaluated using two criteria: link prediction accuracy and predictive distribution modelling accuracy.

For the first evaluation, you need to produce a ranked list of directed arcs in the network. These will be evaluated based on the area under the ROC curve.

For the second evaluation, you need to produce a predictive probability distribution for a given set of test data vectors. The produced distribution on the test data vectors is compared to the distribution produced by the gold standard model using a suitable distance metric.

Marking

Marking is based on the effort put to the task and insights obtained. The quality of the predictions in relation to other students isnot a criterion. Careful analysis of the strengths and weaknesses of the chosen approach and use of own code will be considered positively.

 

Literature and material

 In the project you are expected to apply the skills learned on the Probabilistic Models course.

The data needed for the project will be available in the course Moodle page.