Project in Practical Machine Learning
Vuosi | Lukukausi | Päivämäärä | Periodi | Kieli | Vastuuhenkilö |
---|---|---|---|---|---|
2015 | kevät | 14.01-29.05. | 3-3 | Englanti | Johannes Verwijnen |
Luennot
Aika | Huone | Luennoija | Päivämäärä |
---|---|---|---|
Ke 16-18 | C222 | Johannes Verwijnen | 14.01.2015-14.01.2015 |
Ke 16-18 | C222 | Johannes Verwijnen | 21.01.2015-21.01.2015 |
Yleistä
The purpose of the course is to introduce students to the problematics of machine learning in a realistic setting. Students should be able to identify and take into account the "dirtiness" of real online data; select, justify and implement a machine learning algorithm/technique using a programming environment runnable on a web server; monitor and report the accuracy of their implementation, including reflection of their choices.
Lecture date | Guest lecture | Course lecture |
---|---|---|
Wed, Jan 14th, 16:15 | Janne Sinkkonen, PhD, Senior Data Scientist at Reaktor | Administrative issues |
Wed, Jan 21st, 16:15 | Matti Aksela, PhD, VP, Analytics and Technology at Comptel | Data sources, dirtiness and context, existing tools & libraries, expected outcomes |
Kurssin suorittaminen
Lecture attendance is not mandatory, but each group should prepare to have at least one student attend each lecture. Slides will be available on this page.
The project will be implemented in groups of 1-4 students. Each group will have a meeting with the instructor in the beginning of their project to validate the data source and implementation planned and to explain expected outcomes in detail. Another meeting will be scheduled roughly halfway through the project to ensure that the group is on schedule and refresh expectations. During the project guidance and simple clarifications are available via email.
The number of study points awarded is dependent on the amount of work done on the project. Higher amounts of study points require the implementation of a machine learning algorithm in the language of choice, whereas lower amounts can be achieved by using available libraries. Individual work hours need to be recorded during project work and submitted every Sunday (alternatively you can just share an online spreadsheet with the instructor). All project work should be available in a public GitHub repository.
The course is graded based on the written report and presentation.
Preliminary example schedule
Week starting | Tasks |
---|---|
12.1.2014 | Lectures, deciding on project data source and ML algorithm |
19.1.2014 | 1st meeting, starting work, finding hosting environment |
26.1.2014 | Working on implementation |
9.2.2014 | Starting to run the system, start writing report |
16.2.2014 | No more changes to implementation, writing report |
23.2.2014 | Submit report, presentation |
Kirjallisuus ja materiaali
You can find peer support (and the instructor) on IRC channel #tkt-ppml
Data Sources:
- http://en.wikipedia.org/wiki/List_of_financial_data_feeds
- https://ilmatieteenlaitos.fi/avoin-data
- http://www.infotripla.fi/digitraffic/doku.php?id=start_en
ML libraries (in no particular order):
- Java:
- Python:
Places to host your system:
- department's users-server http://www.cs.helsinki.fi/en/compfac/running-cgi-and-php-scripts-and-use-tomcat-containers
- https://www.heroku.com/
- https://cloud.google.com/appengine/