Data Compression Techniques

Algorithms and machine learning
Advanced studies
Techniques for data compression with an emphasis on text compression including: Huffman and arithmetic coding, Ziv-Lempel compression, Burrows-Wheeler transform and compressed text indexes.


01.03.2012 16.00 B123
Year Semester Date Period Language In charge
2012 spring 17.01-23.02. 3-3 English Juha Kärkkäinen


Time Room Lecturer Date
Tue 12-14 C222 Juha Kärkkäinen 17.01.2012-23.02.2012
Thu 12-14 C222 Juha Kärkkäinen 17.01.2012-23.02.2012
Mon 12-14 B119 Juha Kärkkäinen 06.02.2012-06.02.2012

Exercise groups

Group: 1
Time Room Instructor Date Observe
Wed 12-14 B222 Niko Välimäki 23.01.2012—24.02.2012



Exam Thu 1.3.2012: problems | solutions | results

Registered exercises


  • Feedback available Tuesday, March 27, at 13-13:30 in room B214.

Separate exams

13.04.2012: problems | solutions | grades (Yht = max(KoeS+LhP, KoeS*1.2))

21.08.2012: problems | grades | results

14.09.2012: problems | grades | results

16.11.2012: problems | grades | results


The course introduces techniques for data compression with an emphasis on text compression including Huffman and arithmetic coding, Ziv-Lempel compression, Burrows-Wheeler transform and compressed text indexes.

The students are expected to have basic knowledge on algorithms, data structures, and algorithm analysis. Recommened prequisite course are Data Structures, Models of Computation, Design and Analysis of Algorithms and String Processing Algorithms.

The course is followed by Project in Data Compression Techniques in the next period.

Completing the course

The course consists of lectures, exercises and an exam.

Lectures: Tue and Thu 12-14 in B222.

Attending lectures is not obligatory but it is useful. Lecture notes covering key facts will be posted on this page, but there will be additional examples and explanations during the lectures.

Exercises: Wed 12-14 in B222.

Exercise problems will be available on this page about a week before each exercise session. Model solutions will be available after the exercise session.
The students should solve the problems at home and be prepared to present their solutions at the exercise session. The students are not required to solve all problems, but additional points are awarded according to how many problems have been solved.

Exam: Thu 01.03. at 16:00.

The course exam consists of problems similar to those in the exercises. No notes or other material is allowed in the exam.

  • The maximum score is 50 points.
  • At least 25 points is required to pass the course.

Passing the course or raising the grade is possible later at a renewal exam (where the exercise points affect the grade as with the course exam) and at separate exams (where exercise points are ignored).


The grading is based on the points from the exercises and the exam.

  • 30 points is required to pass and gives the lowest grade 1.
  • 50 points gives the highest grade 5.


Literature and material

The course does not follow any book. All essential material can be found in the lecture notes and in the exercise problems and solutions.

Lecture Notes

  1.  Lecture 17.01. (pages 1-20): PDF | PS (4 slides/page) Change: Added probabilities to Huffman tree example (slide 16).
  2.  Lecture 19.01. (pages 21-31): PDF | PS (4 slides/page) Change: Added the term "dyadic interval" (slides 22-23) and an extra slide on intervals for integer codes (slide 31).
  3. Lecture 24.01. (pages 32-41): PDF | PS (4 slides/page)
  4. Lecture 26.01. (pages 42-55): PDF | PS (4 slides/page)
  5. Lecture 31.01. (pages 56-71): PDF | PS (4 slides/page) Erratum: The definition of kth order entropy was incorrect (slide 67). Addition: an example of kth order entropy (slide69).
  6. Lecture 02.02. (pages 72-77): PDF | PS (4 slides/page)
  7. Lecture 06.02. (pages 78-87): PDF | PS (4 slides/page)
  8. Lecture 07.02. (pages 88-100): PDF | PS (4 slides/page)
  9. Lecture 14.02. (pages 101-115): PDF | PS (4 slides/page) Erratum: The implementation of the search operation (slide 108) using rank-1 was wrong.
  10. Lecture 16.02. (pages 116-128): PDF | PS (4 slides/page) Erratum: Line 3 in WT-access (slide 125): i changed to r.
  11. Lecture 21.02. (pages 129-140): PDF | PS (4 slides/page)
  12. Lecture 23.02. (pages 141-146): PDF | PS (4 slides/page)


  1. Exercise 1 (25.01.) Problems
  2. Exercise 2 (01.02.) Problems
  3. Exercise 3 (08.02.) Problems
  4. Exercise 4 (15.02.) Problems
  5. Exercise 5 (22.02.) Problems
  6. Additional Exercises covering week 6. Problems


Related books and articles

  • D. Adjeroh, T. Bell, A. Mukherjee. The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer, 2008.
  • T. Bell, I. H. Witten, J. G. Cleary. Modeling for Text Compression. ACM Computing Surveys 21(4), 1989, pp. 557-591.
  • T. C. Bell, J. G. Cleary, I. H. Witten. Text Compression. Prentice Hall, 1990.
  • D. Hankerson, G. A. Harris, P. D. Johnson, Jr. Introduction to Information Theory and Data Compression. Chapman & Hall, 2003.
  • G. Navarro, V. Mäkinen. Compressed full-text indexes. ACM Computing Surveys, 39(1), Article 2, 2007.
  • K. Sayood. Introduction to Data Compression. Morgan Kaufmann, 2006.
  • I. H. Witten, A. Moffat, T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.