Project in Probabilistic Models
2-3
Algoritmit ja koneoppiminen
Syventävät opinnot
The task in this course is to implement and empirically validate probabilistic modeling techniques on a real-world data analysis problem. The progress of each participant will be monitored weekly, and at the end the participants are also expected to summarize their results by submitting a project report and giving a short talk. Prerequisites: 582636 Probabilistic Models.
Vuosi | Lukukausi | Päivämäärä | Periodi | Kieli | Vastuuhenkilö |
---|---|---|---|---|---|
2015 | kevät | 11.03-29.04. | 4-4 | Englanti | Petri Myllymäki |
Luennot
Aika | Huone | Luennoija | Päivämäärä |
---|---|---|---|
Ke 16-18 | C220 | Petri Myllymäki | 11.03.2015-29.04.2015 |
Ilmoittautuminen tälle kurssille alkaa tiistaina 17.2. klo 9.00.
Registration for this course starts on Tuesday 17th of February at 9.00.
Yleistä
The task in the project is to build a probabillistic predictive model based on the given set of training data. More details to be added soon.
Schedule (modified, note that the 1st deadline has been extended!):
- Wed 11.03. First meeting. Walk-through of the project.
- Wed 18.03. Meeting with the supervising team. Introduction to the evaluation method used.
- Wed 25.03. Meeting with the supervising team. Q&A.
- Wed 01.04. Meeting with the supervising team. Q&A.
- Mon 06.04. Deadline of Round 1: submission of first set of solutions.
- Wed 08.04. Meeting: results of round 1, feedback.
- Wed 15.04. No meeting.
- Mon 20.04. Deadline of Round 2: submission of second set of solutions.
- Wed 22.04. Meeting: results of round 2, feedback.
- Mon 27.04. Deadline of Round 3: submission of third set of solutions.
- Wed 29.04. Final meeting: results of round 3, short presentations by each student/team.
- Tue 05.05. Deadline for written report (midnight).
- Wed 06.05. Feedback meeting.
The teaching assistant is Johannes Verwijnen.
Kurssin suorittaminen
To pass the course, you need to:
- build a program that reads a set of training data, and calculates probabilities of new, unseen data vectors
- participate in the weekly meetings
- give a short (5 min) presentation at the final meeting
- write a report of your accomplishments during the course
Grading criteria:
- demonstration of capability to apply modeling methods, innovativeness, versatility
- quality of the produced results
- work effort/productivity during the course
Kirjallisuus ja materiaali
Data:
- The data sets and row numbers for test sets are available from here http://www.cs.helsinki.fi/u/jverwijn/teaching/PPM15/
- The data is a (comma-separated) matrix of 303 columns and 67785 rows where each value is an integer measurement by one of the 303 sensors.
- The values are real measurements, discretized to integers. 0 has the special meaning of "no data". All other values are measurements, where you should take noise into account. You can assume that there will not be any negative values and that any values are < 100.
Evaluation environment:
- Protocol spec https://github.com/CloudNSci/cloudnsci-guides
- An example and evaluation code can be found from https://github.com/verwijnen/ProProMo2015
- You are asked to provide a probability distribution over the 100 possible values for 1000 test vectors one column at a time (a total of 303 times) in your solution's output file (csv, 1000 rows of 100 values each). After "guessing" by providing the distribution, you will be given the actual values of the 1000 vectors of the column you just guessed as the next input file (csv, one row of 1000 values). These values can be used as information for the test vectors.
- 1st round results are here http://www.cs.helsinki.fi/u/jverwijn/teaching/PPM15/round1.html
- 2nd round results are here http://www.cs.helsinki.fi/u/jverwijn/teaching/PPM15/round2.html
FAQ:
-
What type of models are allowed for solving the prediction task?
- Any type of model is OK, as long as you produce a probability distribution as the final outcome.
-
Can I use available open-source software packages, or do I have to program everyting myself?
- You are allowed to use existing software packages, but doing the program yourself is considered a plus
-
Can we work in teams?
- You can, in which case you can work on the same program, and give only one presentation together at the final meeting
- Each participant will still need to deliver an individual final report (not a joint report, everybody writes a report in his/her own words)
- The size of the team affects estimation of the work effort, and for thsi reason it is advisable to NOT to consider teams consisting of more than two people