Data Compression Techniques

Algoritmit ja koneoppiminen
Syventävät opinnot
Techniques for data compression with an emphasis on text compression including: Huffman and arithmetic coding, Ziv-Lempel compression, Burrows-Wheeler transform and compressed text indexes.


01.03.2012 16.00 B123
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2012 kevät 17.01-23.02. 3-3 Englanti Juha Kärkkäinen


Aika Huone Luennoija Päivämäärä
Ti 12-14 C222 Juha Kärkkäinen 17.01.2012-23.02.2012
To 12-14 C222 Juha Kärkkäinen 17.01.2012-23.02.2012
Ma 12-14 B119 Juha Kärkkäinen 06.02.2012-06.02.2012


Group: 1
Aika Huone Ohjaaja Päivämäärä Huomioitavaa
Ke 12-14 B222 Niko Välimäki 23.01.2012—24.02.2012



Exam Thu 1.3.2012: problems | solutions | results

Registered exercises


  • Feedback available Tuesday, March 27, at 13-13:30 in room B214.

Separate exams

13.04.2012: problems | solutions | grades (Yht = max(KoeS+LhP, KoeS*1.2))

21.08.2012: problems | grades | results

14.09.2012: problems | grades | results

16.11.2012: problems | grades | results


The course introduces techniques for data compression with an emphasis on text compression including Huffman and arithmetic coding, Ziv-Lempel compression, Burrows-Wheeler transform and compressed text indexes.

The students are expected to have basic knowledge on algorithms, data structures, and algorithm analysis. Recommened prequisite course are Data Structures, Models of Computation, Design and Analysis of Algorithms and String Processing Algorithms.

The course is followed by Project in Data Compression Techniques in the next period.

Kurssin suorittaminen

The course consists of lectures, exercises and an exam.

Lectures: Tue and Thu 12-14 in B222.

Attending lectures is not obligatory but it is useful. Lecture notes covering key facts will be posted on this page, but there will be additional examples and explanations during the lectures.

Exercises: Wed 12-14 in B222.

Exercise problems will be available on this page about a week before each exercise session. Model solutions will be available after the exercise session.
The students should solve the problems at home and be prepared to present their solutions at the exercise session. The students are not required to solve all problems, but additional points are awarded according to how many problems have been solved.

Exam: Thu 01.03. at 16:00.

The course exam consists of problems similar to those in the exercises. No notes or other material is allowed in the exam.

  • The maximum score is 50 points.
  • At least 25 points is required to pass the course.

Passing the course or raising the grade is possible later at a renewal exam (where the exercise points affect the grade as with the course exam) and at separate exams (where exercise points are ignored).


The grading is based on the points from the exercises and the exam.

  • 30 points is required to pass and gives the lowest grade 1.
  • 50 points gives the highest grade 5.


Kirjallisuus ja materiaali

The course does not follow any book. All essential material can be found in the lecture notes and in the exercise problems and solutions.

Lecture Notes

  1.  Lecture 17.01. (pages 1-20): PDF | PS (4 slides/page) Change: Added probabilities to Huffman tree example (slide 16).
  2.  Lecture 19.01. (pages 21-31): PDF | PS (4 slides/page) Change: Added the term "dyadic interval" (slides 22-23) and an extra slide on intervals for integer codes (slide 31).
  3. Lecture 24.01. (pages 32-41): PDF | PS (4 slides/page)
  4. Lecture 26.01. (pages 42-55): PDF | PS (4 slides/page)
  5. Lecture 31.01. (pages 56-71): PDF | PS (4 slides/page) Erratum: The definition of kth order entropy was incorrect (slide 67). Addition: an example of kth order entropy (slide69).
  6. Lecture 02.02. (pages 72-77): PDF | PS (4 slides/page)
  7. Lecture 06.02. (pages 78-87): PDF | PS (4 slides/page)
  8. Lecture 07.02. (pages 88-100): PDF | PS (4 slides/page)
  9. Lecture 14.02. (pages 101-115): PDF | PS (4 slides/page) Erratum: The implementation of the search operation (slide 108) using rank-1 was wrong.
  10. Lecture 16.02. (pages 116-128): PDF | PS (4 slides/page) Erratum: Line 3 in WT-access (slide 125): i changed to r.
  11. Lecture 21.02. (pages 129-140): PDF | PS (4 slides/page)
  12. Lecture 23.02. (pages 141-146): PDF | PS (4 slides/page)


  1. Exercise 1 (25.01.) Problems
  2. Exercise 2 (01.02.) Problems
  3. Exercise 3 (08.02.) Problems
  4. Exercise 4 (15.02.) Problems
  5. Exercise 5 (22.02.) Problems
  6. Additional Exercises covering week 6. Problems


Related books and articles

  • D. Adjeroh, T. Bell, A. Mukherjee. The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer, 2008.
  • T. Bell, I. H. Witten, J. G. Cleary. Modeling for Text Compression. ACM Computing Surveys 21(4), 1989, pp. 557-591.
  • T. C. Bell, J. G. Cleary, I. H. Witten. Text Compression. Prentice Hall, 1990.
  • D. Hankerson, G. A. Harris, P. D. Johnson, Jr. Introduction to Information Theory and Data Compression. Chapman & Hall, 2003.
  • G. Navarro, V. Mäkinen. Compressed full-text indexes. ACM Computing Surveys, 39(1), Article 2, 2007.
  • K. Sayood. Introduction to Data Compression. Morgan Kaufmann, 2006.
  • I. H. Witten, A. Moffat, T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.