582650 Information-Theoretic Modeling (4 cr)
Department of Computer Science, University of Helsinki

An advanced course in the Algorithms and Machine Learning sub-programme. Students in other programmes, as well as students in mathematics and statistics are welcome too.

This course provides an introduction to information theory from the computer science perspective. Much of the course can be viewed as consequences of Shannon's central result known as the Noiseless Source Coding Theorem. The theoretical results will be illustrated by various practical data compression systems from Huffman coding to Rissanen's arithmetic coding. In order to demonstrate the wide applicability of information-theoretic concepts in intelligent systems, we discuss information-theoretic principles in statistical modeling, and especially the Minimum Description Length (MDL) principle.

Jorma Rissanen, the inventor of the MDL principle (left) receiving a best paper award from Claude Shannon, the father of information theory.

Note: Since the contents of this course overlap with the earlier Three Concepts: Information (3CI) course, it is not possible to include both of them in your degree. This also applies to the project.
Teemu Roos, A332, teemu.roos at cs.helsinki.fi
Teaching assistant
Anupam Arohi, anupam.arohi at cs.helsinki.fi

Language: All course material and lectures will be in English.

Lectures: Period I: 8.9.–16.10. Tue & Fri 10–12 in C222 (except 25.9. in CK112). All the lectures will be recorded on video and made available the following day.

Exercises: 18.9.–16.10. Fri 12–14 in C222.

Final exam: Tue 20.10. 9-12AM in B123

The course is followed by a related project in Period II. It is recommended that you take both the course and the project. (More information will be posted later.)

  • Elementary calculus (limits and convergence, convexity)
  • Basic probability (random variables, conditional and joint distributions, expectation)
  • Good programming skills (no specific language required)


The schedule below is tentative and subject to change.

If you don't have a high-speed connection, you may prefer the downloadable versions of the videos.
Lecture 1. (Tue 8.9.)
Introduction. participation obligatory
• introduction to the course • overview of contents
Slides (PDF) slides   Video video   Exercises (PDF) exercises #1  

Lecture 2. (Fri 11.9.)
Noisy Channel Coding.
• error correcting codes • noisy channel coding theorem • Hamming codes
Slides (PDF) slides   Video video

Lecture 3. (Tue 15.9.)
Mathematical Preliminaries.
• calculus • probability • law of large numbers • Jensen's and Gibbs' inequalities
Slides (PDF) slides   Video video   Exercises (PDF) exercises #2 | files

Lecture 4. (Fri 18.9.)
Source Coding: Theory.
• entropy and information • asymptotic equipartition property (AEP) • noiseless source coding theorem
Slides (PDF) slides   Video video

Lecture 5. (Tue 22.9.)
Source Coding: Practice.
• symbol codes • Kraft inequality • entropy coding
Slides (PDF) slides   Video video   Exercises (PDF) exercises #3

Lecture 6. (Fri 25.9.)
Special Guest Lecture: Daniel Schmidt
• Minimum Message Length (MML) principle
Slides (PDF) slides   Video video

Lecture 7. (Tue 29.9.)
Source Coding: Practice (contd.).
• Shannon-Fano code • Huffman code • arithmetic coding
Slides (PDF) slides   Video video   Exercises (PDF) exercises #4

Lecture 8. (Fri 2.10.)
Universal Source Coding.
• two-part codes • mixture codes • normalized maximum likelihood
Slides (PDF) slides   Video video

Lecture 9. (Tue 6.10.)
MDL Principle.
• Occam's razor • old-style & modern MDL
Slides (PDF) slides  |  more slides   Video video   Exercises (PDF) exercises #5 | data

Lecture 10. (Fri 9.10.)
MDL Principle (contd.).
• example applications
Slides (PDF) slides   Video video

Lecture 11. (Tue 13.10.)
Further Topics
• Kolmogorov complexity • gambling • lossy coding • etc.
Slides (PDF) slides   Video video

Lecture 12. (Fri 16.10.)
Redundant Lecture
• summary of course contents • questions and answers before final exam • introduction to project
Video video

Additional Material (by Lecture)

Here you can find recommended (but not required) additional material, categorized according to (our) lectures.
Lecture 2. Noisy Channel Coding
Information and Entropy
an MIT OpenCourseWare elementary level course • for material by topic, click here
material on noise and errors

Lecture 4. Source Coding: Theory
Cover & Thomas, Elements of Information Theory, Chapter 2
(see course folder, C127)

Claude E. Shannon, "A mathematical theory of communication", The Bell Systems Technical Journal, Vol. 27, pp. 379–423, 623–656, July-October, 1948.
a reprint (with corrections) from Bell Labs

Joaquin Candela, Probability, Information Theory and Bayesian Inference, on videolectures.net
• part 1: probability
• parts 2 & 3: entropy and information
• part 4: Bayesian inference

Lectures 5 & 7. Source Coding: Practice
Witten, Neal & Cleary, "Arithmetic coding for data compression", Communications of the ACM, Vol. 30, No 6, pp. 520–540, 1987.

Cover & Thomas, Elements of Information Theory, Chapter 5
(see course folder, C127)

Lelewer & Hirschberg, "Data compression", ACM Computing Surveys, Vol. 19, No. 3, pp. 261–296, 1987.

Ken Huffman, "Huffman algorithm", Huffman Coding Blog, Dec. 24, 2008.
a mostly informal blog entry written by David Huffman's nephew &bull relates Huffman code to mobiles (no, not cell phones)

David Solomon, Data Compression: The Complete Reference, Springer, 2004, 926 pages
the title says it all: a comprehensive reference; more recent editions available

Lecture 6. Minimum Message Length Principle
Lloyd Allison, Minimum Message Length, web resource
brief introduction to basic ideas • many useful links

Lecture 8: Universal Source Coding
Peter Grünwald, The Minimum Description Length Principle, MIT Press, 2007, 504 pages
• part 2: universal coding

Lectures 9 & 10: MDL Principle
Peter Grünwald, The Minimum Description Length Principle, MIT Press, 2007, 504 pages
• chapter 1: learning, regularity, and compression
PDF available

Peter Grünwald, MDL Tutorial, on videolectures.net

Jorma Rissanen, Minimum Description Length Principle, Scholarpedia, 3(8):6727.

Kontkanen & Myllymäki, "A linear-time algorithm for computing the multinomial stochastic complexity", Information Processing Letters, Vol. 103, No. 6, pp. 227–233, 2007.
a simple yet very useful result • quite complex proof

Lecture 11: Further Topics
Li & Vitanyi, Kolmogorov Complexity and Its Applications, Springer, 1997.
the book about Kolmogorov complexity

John L. Kelly, Jr., "A new interpretation of information rate", Bell Systems Technical Journal, Vol. 35, pp. 917–926, 1956.
PDF available

Video Coding Fundamentals & H.264, blog
image coding
video coding

September 8 First lecture (C222); participation obligatory.

September 9 Video of first lecture and first set of exercises available (see Contents below).

September 18 First exercises (C222).

September 25 We have a special guest lecturer, Dr. Daniel Schmidt, who talks about the Minimum Message Length (MML) principle. Note changed room: CK112.

October 20, 9-12AM Final exam, B123.

November 20, 16-20PM Separate exam in case you missed or failed the first one, A111.


University of Helsinki | Department of Computer Science | Helsinki Institute for Information Technology