Project in Information-Theoretic Modeling

Basic information

Course code: 582651

Credit units: 2

Subprogramme: Algorithms and machine learning

Level: Advanced studies

Description:

A programming project applying the techniques learned in the course Information-Theoretic Modeling, which is a prerequisite.

Year	Semester	Date	Period	Language	In charge
2012	autumn	29.10-03.12.	2-2	English

Exercise groups

Group: 1
Time	Room	Instructor	Date	Observe
Tue 10-12	C220	Hannes Wettig	29.10.2012—07.12.2012

General

The course has been graded and the results are available. (21 December 2012)

—

This project course follows upon the lecture course 582650 Information-Theoretic Modeling. You need to have taken the lecture course or acquired the equivalent skills somewhere else. If you have not taken the lecture course and wish to participate in the project, please contact the instructor Hannes Wettig and explain how you have studied the topic.

You will also need good programming skills in a language of your choice. Programming is done in the department's Linux environment.

The course consists of a series of programming problems where the task is to compress a given data set. The work is expected to be done in groups of 2–3 students.

Learning objectives of the project

The purpose of the project is to deepen the skills and understanding gained on the lecture course by more hands-on work that also allows for more concentration on some specific example cases.

In particular, the student

gets practical experience of different coding methods, including implementation-level details
sees the usefulness of different models and algorithms in compressing data
puts concepts from the lecture course more closely into the context of applications in machine learning and data compression.

The exercises include applications of the Minimum Description Length (MDL) principle for model class selection, prediction and denoising problems.

Completing the course

Students are expected to work in groups of two or three throughout the project. To pass the course, the group must

Turn in solutions to four programming tasks, due on weeks 2, 3, 4 and 6 of the course (see below for details). Each solution is a tar file which when unpacked includes a 'run.sh' script that produces all data that have been assigned so far. These packages have to be delivered by tuesday, 9 o'clock by mail (wettig@cs.helsinki.fi) such that they can be checked in time before class.
Give a short presentation about their solutions to each programming task to other students
Turn in a written final report covering all four problems.

All students in the same group receive the same grade for the course. The grade of a group is based on

Written end report (65% of grade)
Compression performance calculated from the sizes of the group's solutions to programming tasks, 1–4 (35% of grade).

The written report should show that the students are able to choose appropriate tools for compressing given data and can justify their choices. Where you choose to use readily available software, you must explain what it does and why this is appropriate for the task at hand.

For the first programming task, the data will be simple and students will be given the model from which is has been generated. The purpose of this task is to make sure that students get a good start and build up some tools they can use on later tasks where finding appropriate models is an essential part of the task. Therefore, for the first tasks students are required to use arithmetic coding. It is allowed (and even recommended) that you use an existing implementation of arithmetic coding, but you must make sure you understand how it works.

The second data set will be a simple time series. The third data set will be from a classification problem. The fourth data set will consist of image data.

You are allowed to use existing software packages, but in your written report you are required to explain the compression algorithms they are based on. You are also required to explain why you think a particular package is a better choice than using some of the methods you learned about in the course. A good explanation here would be based on actually trying something to find out how it works. We try to choose the data sets in such a way that it is possible to beat the performance of off-the-shelf packages by using coding algorithms we know from the course together with some relatively simple models of the data.

Results

There have been three groups. The package sizes in bytes (in case of problem 5 the error penalty) are listed below. The final score is a weighted sum of these numbers, namely 0.1*x₁+ 0.2*x₂ + 0.3*x₃ + 0.4*x_4/5 , where x_i is the package size of the package for problem i. The final reports can also be found below.

Results
	problem 1	problem 2	problem 3	problem 4	problem 5	weighted sum	report
Team A	9468	44629	43683	287183	189724	213740.3	http://www.cs.helsinki.fi/u/wettig/A.pdf
Team B	9344	77373	48062	291237	187155	222184.4	http://www.cs.helsinki.fi/u/wettig/B.pdf
Team C	16244	71258	79067	349815	243103	276743.3	http://www.cs.helsinki.fi/u/wettig/C.pdf

Address: Department of Computer Science, P.O. 68 (Gustaf Hällströmin katu 2b), FI-00014 UNIVERSITY OF HELSINKI, FINLAND
Opening Hours: During spring and autumn semesters Mon - Fri 7.45 - 19.45 (7.45 am - 7.45 pm)
Phone: +358 9 1911 (University switch)
General e-mail: info [at] cs.helsinki.fi
Fax: +358 9 876 4314

Department of Computer Science [pre 2018 site]

University of Helsinki

Faculty of Science