582487 Data Compression Techniques (4 cr, 2 cu)
Level
The course is an elective Master level course suitable for sub-programmes on Algorithmics and on Bioinformatics and computational biology.
Prerequisites
Data structures -course or equivalent knowledge is required. Some knowledge on discrete mathematics and probability theory is helpful. Courses String Processing Algorithms and Design and Analysis of Algorithms give useful background, but are not required.
Organizers
Lecturer Veli Mäkinen,
Assistant Jarkko Toivonen
News
The course exam has been graded. Results are available here. There will be a feedback session on Friday 12th May at 11-12 in room B230. The next opportunity to take the course exam is on Tuesday 13.6. 16-20 A111.
Lectures: 14.03.-27.04. TUE, THU 10-12 D122
Exercises: 20.03.-28.04. WED 10-12 B119
Lecture script available at Yliopistopaino (Exactum 1. floor) against 5 euros.
A more comprehensive introduction to Information Theory can be found here.
Description
The first part of the course covers the standard techniques in data compression, including Huffman coding, Arithmetic coding, Lempel-Ziv parsing, PPM, Burrows-Wheeler transform, integer codes, etc. The emphasis is on algorithms; how to compress/uncompress efficiently. The necessary mathematical background is also given to explain how good are the above compressors, leading to issues like compressibility, incompressibility, entropies, and Kolmogorov complexity. The second part of the course concentrates on a modern specialization area - data structure compression. This new area has emerged especially from the need of large-scale sequence analysis tasks in bioinformatics: we will learn how to represent a widely used data structure, suffix array, using only the same space as a compressed sequence. This result builds on the techniques from the first part, i.e. Burrows-Wheeler transform and Huffman coding, and on some new useful techniques that are building blocks of several other compressed data structures.
Getting the grade
There will be an exam on Tuesday 2.5 at 9:00-12:00, giving 40 points at the maximum. Active participation to exercises gives maximum 6 extra points (25%->1p,37.5%->2p,50%->3p,62.5%->4p,75%->5p,87.5%->6p). The grading is then based on the total points achieved.
Schedule
| Wk | Date | Script | Topics | Notes |
|---|---|---|---|---|
| 11 | TUE 14.3 | 1.1.1, 2.1.1 | Administration, Background, simple codings, zeroth order entropy | |
| 11 | WED 15.3 | 2.1.2, 1.1.2 | Self-delimiting codes, Prefix codes, Krafts Inequality | Extra lecture, room B119, 10-12 |
| 11 | THU 16.3 | 2.1.3-2.1.5 | Shannon-Fano coding, Huffman coding, Implementing Huffman coding | Lecture slides 1-3 |
| 12 | TUE 21.3, THU 23.3 | No lectures this week | ||
| 12 | WED 22.3 | Exercise 1 | Exercise sheet (pdf) Solutions (pdf) | |
| 13 | TUE 28.3 | 2.1.6-2.1.7 | Adaptive Huffman, Optimality of Huffman coding | |
| 13 | WED 29.3 | 2.1.8 | Arithmetic coding | Extra lecture in place of an exercise session |
| 13 | THU 30.3 | 1.1.3-1.1.5, 2.2 | Kolmogorov complexity, Incompressibility: average case sorting lower-bound. Lempel-Ziv | Lecture slides 4-6 |
| 14 | TUE 4.4 | 2.2,1.1.1,2.3 | Implementing Lempel-Ziv using suffix trees, k-th order entropy, static PPM | |
| 14 | WED 5.4 | Exercise 2 | Exercise sheet (pdf) Solutions (pdf) | |
| 14 | THU 6.4 | 2.3, 2.4 | Adaptive PPM, Burrows-Wheeler transform | Lecture slides 7-8 |
| 15 | TUE 11.4 | 2.4.1-2.4.3 | Implementing Burrows-Wheeler: suffix array construction, Compressibility of Burrows-Wheeler transformed text | |
| 15 | WED 12.4 | Exercise 3 | Exercise sheet (pdf) Solutions (pdf) | |
| 15 | THU 13.4 | No lecture (Eastern holiday) | ||
| 16 | TUE 18.4 | No lecture (Eastern holiday) | ||
| 16 | WED 19.4 | No exercises (Eastern holiday) | ||
| 16 | THU 20.4 | 3.1-3.2 | Introduction to Data Structure Compression: bit-arrays and binary trees | Lecture slides 9-10 |
| 17 | TUE 25.4 | 3.3, 3.4.1 | Backward search, compressed suffix arrays using wavelet trees | Lecture slides 11 |
| 17 | WED 26.4 | Exercise 4 | Exercise sheet (pdf) Solutions (pdf) | |
| 17 | THU 27.4 | 1,2.1.3,3.1-3.3,4.1.1 | Image compression - overview | Image compression script by Pasi Fränti |
| 17 | FRI 28.4 | Exercise 5 | Exercise sheet (pdf) Solutions (pdf) | |
| 18 | TUE 2.5 | EXAM 9:00-12:00 | Results |
Veli Mäkinen

