University of Helsinki Department of Computer Science
 

Department of Computer Science

Department information

 

582487 Data Compression Techniques (4 cr, 2 cu)

Level

The course is an elective Master level course suitable for sub-programmes on Algorithmics and on Bioinformatics and computational biology.

Prerequisites

Data structures -course or equivalent knowledge is required. Some knowledge on discrete mathematics and probability theory is helpful. Courses String Processing Algorithms and Design and Analysis of Algorithms give useful background, but are not required.

Organizers

Lecturer Veli Mäkinen,
Assistant Jarkko Toivonen

News

The course exam has been graded. Results are available here. There will be a feedback session on Friday 12th May at 11-12 in room B230. The next opportunity to take the course exam is on Tuesday 13.6. 16-20 A111.

Lectures: 14.03.-27.04. TUE, THU 10-12 D122

Exercises: 20.03.-28.04. WED 10-12 B119

Lecture script available at Yliopistopaino (Exactum 1. floor) against 5 euros.

A more comprehensive introduction to Information Theory can be found here.

Description

The first part of the course covers the standard techniques in data compression, including Huffman coding, Arithmetic coding, Lempel-Ziv parsing, PPM, Burrows-Wheeler transform, integer codes, etc. The emphasis is on algorithms; how to compress/uncompress efficiently. The necessary mathematical background is also given to explain how good are the above compressors, leading to issues like compressibility, incompressibility, entropies, and Kolmogorov complexity. The second part of the course concentrates on a modern specialization area - data structure compression. This new area has emerged especially from the need of large-scale sequence analysis tasks in bioinformatics: we will learn how to represent a widely used data structure, suffix array, using only the same space as a compressed sequence. This result builds on the techniques from the first part, i.e. Burrows-Wheeler transform and Huffman coding, and on some new useful techniques that are building blocks of several other compressed data structures.

Getting the grade

There will be an exam on Tuesday 2.5 at 9:00-12:00, giving 40 points at the maximum. Active participation to exercises gives maximum 6 extra points (25%->1p,37.5%->2p,50%->3p,62.5%->4p,75%->5p,87.5%->6p). The grading is then based on the total points achieved.

Schedule

Wk Date ScriptTopicsNotes
11TUE 14.31.1.1, 2.1.1Administration, Background, simple codings, zeroth order entropy
11WED 15.32.1.2, 1.1.2Self-delimiting codes, Prefix codes, Krafts InequalityExtra lecture, room B119, 10-12
11THU 16.32.1.3-2.1.5Shannon-Fano coding, Huffman coding, Implementing Huffman coding Lecture slides 1-3
12TUE 21.3, THU 23.3No lectures this week
12WED 22.3Exercise 1Exercise sheet (pdf) Solutions (pdf)
13TUE 28.32.1.6-2.1.7Adaptive Huffman, Optimality of Huffman coding
13WED 29.32.1.8Arithmetic codingExtra lecture in place of an exercise session
13THU 30.31.1.3-1.1.5, 2.2Kolmogorov complexity, Incompressibility: average case sorting lower-bound. Lempel-ZivLecture slides 4-6
14TUE 4.42.2,1.1.1,2.3Implementing Lempel-Ziv using suffix trees, k-th order entropy, static PPM
14WED 5.4Exercise 2Exercise sheet (pdf) Solutions (pdf)
14THU 6.42.3, 2.4Adaptive PPM, Burrows-Wheeler transformLecture slides 7-8
15TUE 11.42.4.1-2.4.3Implementing Burrows-Wheeler: suffix array construction, Compressibility of Burrows-Wheeler transformed text
15WED 12.4Exercise 3Exercise sheet (pdf) Solutions (pdf)
15THU 13.4No lecture (Eastern holiday)
16TUE 18.4No lecture (Eastern holiday)
16WED 19.4No exercises (Eastern holiday)
16THU 20.43.1-3.2Introduction to Data Structure Compression: bit-arrays and binary treesLecture slides 9-10
17TUE 25.43.3, 3.4.1Backward search, compressed suffix arrays using wavelet treesLecture slides 11
17WED 26.4Exercise 4Exercise sheet (pdf) Solutions (pdf)
17THU 27.41,2.1.3,3.1-3.3,4.1.1Image compression - overviewImage compression script by Pasi Fränti
17FRI 28.4Exercise 5Exercise sheet (pdf) Solutions (pdf)
18TUE 2.5EXAM 9:00-12:00Results


Veli Mäkinen
Last updated 28.3.2006
>