Project in Practical Machine Learning
Vuosi | Lukukausi | Päivämäärä | Periodi | Kieli | Vastuuhenkilö |
---|---|---|---|---|---|
2016 | kevät | 20.01-27.01. | 3-3 | Englanti | Johannes Verwijnen |
Luennot
Aika | Huone | Luennoija | Päivämäärä |
---|---|---|---|
Ke 16-18 | CK111 | Johannes Verwijnen | 20.01.2016-27.01.2016 |
Ke 16-18 | C220 | Johannes Verwijnen | 02.03.2016-02.03.2016 |
Yleistä
NOTE CHANGE OF CLASSROOM CK111!
The purpose of the course is to introduce students to the problematics of machine learning in a realistic setting. Students should be able to identify and take into account the "dirtiness" of real online data; select, justify and implement a machine learning algorithm/technique using a programming environment runnable on a web server; monitor and report the accuracy of their implementation, including reflection of their choices.
Lecture date | Guest lecture | Course lecture |
---|---|---|
Wed, Jan 20th, 16:15 | Cancelled :( Just me babbling then | Administrative issues, topic introduction, scope, dirtiness and context, existing tools & libraries |
Wed, Jan 27th, 16:15 | Janne Sinkkonen, PhD, Senior Data Scientist at Reaktor | Expected outcomes |
The course consists of two lectures and project work. The project work can also be done during the summer holiday, lectures are probably not provided during the summer.
Schedule (preliminary):
Jan 20th | First lecture, start planning & reserach of your topic and data sources |
Jan 27th | Second lecture, have first meeting with instructor scheduled |
Jan 28-Feb 2 |
First meetings with instructor |
Mar 2nd | Demo / short presentation of project |
Mar 1-10 | Second meetings with instructor |
Mar 20th | Deadline for report |
Kurssin suorittaminen
Lecture attendance is not mandatory, but very useful. Slides will be available on this page.
The project will be implemented either individually or in pairs. Everyone will have a meeting with the instructor in the beginning of their project to validate the data source and implementation planned and to explain expected outcomes in detail. Another meeting will be scheduled roughly halfway through the project to ensure that the group is on schedule and refresh expectations. During the project guidance and simple clarifications are available via email.
The number of study points awarded is dependent on the amount of work done on the project. Higher amounts of study points require the implementation of a machine learning algorithm in the language of choice, whereas lower amounts can be achieved by using available libraries. Individual work hours need to be recorded during project work and submitted every Sunday (alternatively you can just share an online spreadsheet with the instructor). All project work should be available in a public GitHub repository.
The course is graded based on the written report and presentation.
Kirjallisuus ja materiaali
You can find peer support (and the instructor) on IRC channel #tkt-ppml2016. Please let the instructor know about any other sources you find interesting for inclusion.
Data Sources:
- http://en.wikipedia.org/wiki/List_of_financial_data_feeds
- https://ilmatieteenlaitos.fi/avoin-data
- http://www.infotripla.fi/digitraffic/doku.php?id=start_en
- Facebook: "Finnish Open Data Ecosystem" group
ML libraries (in no particular order):
- Java:
- Python:
Places to host your system:
- department's users-server http://www.cs.helsinki.fi/en/compfac/running-cgi-and-php-scripts-and-use-tomcat-containers
- https://www.heroku.com/
- https://cloud.google.com/appengine/