Project in Probabilistic Models

582637
2-3
Algoritmit ja koneoppiminen
Syventävät opinnot
The task in this course is to implement and empirically validate probabilistic modeling techniques on a real-world data analysis problem. The progress of each participant will be monitored weekly, and at the end the participants are also expected to summarize their results by submitting a project report and giving a short talk. Prerequisites: 582636 Probabilistic Models.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2017 kevät 15.03-03.05. 4-4 Englanti Juho-Kustaa Kangas

Luennot

Aika Huone Luennoija Päivämäärä
Ke 16-18 B119 Kustaa Kangas 15.03.2017-12.04.2017
Ke 16-18 B119 Kustaa Kangas 26.04.2017-03.05.2017

Ilmoittautuminen tälle kurssille alkaa tiistaina 16.2. klo 9.00.

Registration for this course starts on Tuesday 16th of February at 9.00.

Information for international students

The course will be held in English.

Yleistä

The course instructor is Dr Kustaa Kangas (room: A331, email: juho-kustaa [dot] kangas [at] helsinki [dot] fi).

This course involves project work in probabilistic modeling. The task in this course is to implement and empirically validate probabilistic modeling techniques on a real-world data analysis problem. The progress of each participant will be monitored weekly, and at the end the participants are also expected to summarize their results by submitting a project report and giving a short talk.

This year, your task is to construct a programme that learns a Bayesian network model (structure) from a given set of discrete training data. There are two types of evaluation: accuracy of link predictions and accuracy of the predictive distribution. These will be compared to those of the (hidden) "golden standard" Bayesian network that was used for generating the training data. The golden standard solution is not given (until at the very end of the course), but each student is given each week a score describing how close his/her solutions are to the golden standard solution.

Please note that to pass this project course you also need to successfully pass the course 582636 Probabilistic Models. If you attended Probabilistic Models this Spring and the decision is still pending, you may sign up for this project course but will not get the credit unless you also pass Probabilistic Models. For any questions, please contact the course instructor.
 

Kurssin suorittaminen

Schedule

Wed 15 March (week 11)

  • Initial lecture

Wed 22 March (week 12)

  • Q+A session

Wed 29 March (week 13)

  • 8:00 a.m., First return DL
  • First feedback session

Wed 5 April (week 14)

  • 8:00 a.m., Second return DL
  • Second feedback session

Wed 12 April (week 15)

  • 8:00 a.m., Extra return DL

Wed 19 April (week 16)

  • Easter holiday, no session

Wed 26 April (week 17)

  • 8:00 a.m., Third return DL
  • Third feedback session

Wed 3 May (week 18)

  • 8:00 a.m., Final predictions return DL
  • 8:00 a.m., Final report return DL
  • Final session

 

Course requirements

Every student is required to

  • each week submit predictions as outlined below and a brief diary of the methods and progress, as well as participate in the weekly sessions where the progress is monitored
  • at the end of the course, give a brief talk describing the methodology used in the submitted solutions
  • submit a final report describing the progress during project, the methods used, the results and main observations done and the essential technical implementational aspects
  • submit the sources of the programme code used

You may use publicly available software in your project, as long as it is freely available for academic use. Use of your own software will be considered positively in the marking. Also note that every student needs to submit an individual solution, this is not team work.

Evaluation of the predictions

Your results for the network structure learning will be evaluated using two criteria: link prediction accuracy and predictive distribution modelling accuracy.

For the first evaluation, you need to produce a ranked list of directed arcs in the network. These will be evaluated based on the area under the ROC curve.

For the second evaluation, you need to produce a predictive probability distribution for a given set of test data vectors. The produced distribution on the test data vectors is compared to the distribution produced by the gold standard model using a suitable distance metric.

Grading

Grading is based on the effort put to the task, the ability to justify the choices made and insights obtained. The quality of the predictions in relation to other students is not a criterion. Careful analysis of the strengths and weaknesses of the chosen approach and use of own code will be considered positively.

Substantial use of own code will be awared with an extra credit.

 

Kirjallisuus ja materiaali

In the project you are expected to apply the skills learned on the Probabilistic Models course.

The data needed for the project will be available in the course Moodle page: https://moodle.helsinki.fi/course/view.php?id=24002
 

If you cannot access the Moodle page, please contact the course instructor ASAP.