# Project in Probabilistic Models

Vuosi | Lukukausi | Päivämäärä | Periodi | Kieli | Vastuuhenkilö |
---|---|---|---|---|---|

2012 | kevät | 13.03-24.04. | 4-4 | Englanti | Antti Honkela |

## Luennot

Aika | Huone | Luennoija | Päivämäärä |
---|---|---|---|

Ti 10-12 | C220 | Antti Honkela | 13.03.2012-24.04.2012 |

Ilmoittautuminen tälle kurssille alkaa tiistaina 21.2. klo 9.00.

Registration for this course starts on Tuesday 21st of February at 9.00.

## Information for international students

The course will be held in English.

## Yleistä

The course instructor is Dr Antti Honkela.

This course involves project work in probabilistic modeling. The task in this course is to implement and empirically validate probabilistic modeling techniques on a real-world data analysis problem. The progress of each participant will be monitored weekly, and at the end the participants are also expected to summarize their results by submitting a project report and giving a short talk.

This year, your task is to construct a programme that learns a Bayesian network model (structure) from a given set of discrete training data. There are two types of evaluation: accuracy of link predictions and accuracy of the predictive distribution. These will be compared to those of the (hidden) "golden standard" Bayesian network that was used for generating the training data. The golden standard solution is not given (until at the very end of the course), but each student is given each week a score describing how close his/her solutions are to the golden standard solution.

Please note that you need to have successfully passed the course 582636 Probabilistic Models before attending this course. If you attended the course this Spring and the decision is still pending, you may sign up for this project, but cannot participate if you fail to pass the basic course.

## Kurssin suorittaminen

### Schedule

13 March | Initial lecture |

20 March | Lecture (Q+A session) |

25 March | First return DL |

27 March | First feedback session |

1 April | Second return DL |

3 April | Second feedback session |

10 April | No lecture (Easter) |

15 April | Third return DL |

17 April | Third feedback session |

22 April | Final return DL |

24 April | Final session |

### Course requirements

Every student is required to

- each week submit predictions as outlined below and a brief diary of the methods and progress, as well as participate in the weekly sessions where the progress is monitored
- at the end of the course, give a brief talk describing the methodology used in the submitted solutions
- submit a final report describing the progress during project, the methods used, the results and main observations done and the essential technical implementational aspects
- submit the sources of the programme code used

You may use publicly available software in your project, as long as it is freely available for academic use. Use of your own software will be considered positively in the marking. Also note that every student needs to submit an individual solution, this is not team work.

### Evaluation of the predictions

Your results for the network structure learning will be evaluated using two criteria: link prediction accuracy and predictive distribution modelling accuracy.

For the first evaluation, you need to produce a ranked list of directed arcs in the network. These will be evaluated based on the area under the ROC curve.

For the second evaluation, you need to produce a predictive probability distribution for a given set of test data vectors. The produced distribution on the test data vectors is compared to the distribution produced by the gold standard model using a suitable distance metric.

### Marking

Marking is based on the effort put to the task and insights obtained. The quality of the predictions in relation to other students is**not** a criterion. Careful analysis of the strengths and weaknesses of the chosen approach and use of own code will be considered positively.

## Kirjallisuus ja materiaali

In the project you are expected to apply the skills learned on the Probabilistic Models course.

The data needed for the project will be available in the course Moodle page.