Projects in Unsupervised Machine Learning

582674
3
Algorithms and machine learning
Advanced studies
Practical implementation of methods taught in the course Unsupervised Machine Learning, in a number of short computer projects. The projects are done in parallel to the course. The project work can be done in addition to or as an alternative to taking the course exam.
Year Semester Date Period Language In charge
2013 spring 14.01-22.02. 3-3 English Jukka-Pekka Kauppi

Information for international students

The course will be completely in English.

 

General

NEW! 6.3. Third assignment has been published. Deadline of the report is 24th of March (Sunday) by midnight!

Older news:

  • Second assignment has been published. Deadline of the report is 28th of February (Thursday) by midnight!
  • There will be no appointment time on Tuesday 29.1.
  • First assignment has been published (pdf-file in the end of this page). Deadline of the report is 8th of February (by midnight). You can also send the report after the deadline, but your points will be reduced 0.5 points per each extra day spent (max number of points for each assignment is 10). This system gives you possibility to work after the deadline but also gives credit to those students who start working on with the reports early.

Notes and basic information:

  • This year the course will take place already in period III!
  • All the relevant information concerning the project works will be available in this web page and exercise handouts. However, short description how to complete the computer assignments and written reports will be given in the beginning of the first exercise session of Unsupervised Machine Learning at Friday 18th of January in C222.
  • There will be no organized exercise sessions in this course but personal guidance is provided during the projects.
  • Basic skills of Matlab or R are necessary to finish the course! This reference can be helpful for self-study: matlab R reference.

In this course, students will learn how to implement well-known unsupervised machine learning methods using an appropriate scientific computing tool (Matlab or R). Although several machine learning packages are nowadays freely available, it is extremely important to learn practical implementation skills in order to customize existing algorithms as well as to more deeply understand the practical limitations and benefits of the existing methods. The course also teaches scientific reporting skills as the grading of the course will be based on the quality of the written reports.

The essential requirement of practical implementation is to understand the theory behind the methods sufficiently well. Therefore, it is essential to take this course together with the course Unsupervised Machine Learning, where the theory will be explained in detail. The project assignments will be published in this page soon after the essential theory has been covered in the lectures. There will be altogether three assignments with the following topics:

  1. Principal component analysis (PCA) and factor analysis. Handed out: January 22 (Tue), deadline: February 8 (Fri). pdf-file is given in the end of this page!
  2. Independent component analysis (ICA). Handed out: February 12 (Tue), deadline: February 28 (Thu). pdf-file is given in the end of this page!
  3. Clustering and projection methods. Handed out: March 6 (Wed), deadline: March 24 (Sun). pdf-file is given in the end of this page!

There will be  2-3 weeks time to complete each assignment. Because implementation of the methods takes time, it is highly important that the work is started soon after each assignment has been published! The experience from previous years has shown that the assignments cannot be completed if the work has been started only couple of days before the deadline. The recommendation is to carry out the project works in pairs (no more than 2 persons!) but they can also be done individually.

Completing the course

For every computer assignment, you will need to:

  • Implement core methods from the course Unsupervised Machine Learning using Matlab or R.
  • Write a report where you present and discuss your solution.

The grade will be based on the written reports, so make them clear and enjoyable to read!

Reports should be sent to Jukka-Pekka Kauppi (jukka-pekka.kauppi{at}helsinki.fi) by the deadlines shown above (by midnight). If you have questions concerning the assignments, send e-mail to Jukka-Pekka Kauppi or come and discuss to room A313 (Exactum building) Tuesdays or Wednesdays 16-17.

 

Literature and material

Assignment 1.

Principal component analysis (PCA) and factor analysis. Here: comp01.pdf

Data needed: digits.txt, noisyDigits.txt.

Functions for visualization: visual.m, visual.R

Assignment 2.

Independent component analysis (ICA). Here: comp02.pdf

Data needed: mixed_images.txt,

Assignment 3.

Clustering and projection methods. Here:  comp03.pdf

Data needed: data_proj.txt, images_txt