
582650 InformationTheoretic Modeling (4 cr)
Department of Computer Science, University of Helsinki
An advanced course in the Algorithms and Machine Learning
subprogramme. Students in other programmes, as well as students
in mathematics and statistics are welcome too.
This course provides an introduction to information theory from the
computer science perspective. Much of the course can be viewed as
consequences of Shannon's central result known as the Noiseless Source
Coding Theorem. The theoretical results will be illustrated by various
practical data compression systems from Huffman coding
to Rissanen's arithmetic coding. In order to demonstrate the wide
applicability of informationtheoretic concepts in intelligent
systems, we discuss informationtheoretic principles in statistical
modeling, and especially the Minimum Description Length (MDL) principle.
 Jorma Rissanen, the
inventor of the MDL principle (left)
receiving a best paper award from Claude Shannon, the father of
information theory.


Note: Since the contents of this course overlap with the
earlier
Three Concepts: Information (3CI) course, it is not
possible to include both of them in your degree. This also applies to
the project.
 Lecturer
 Teemu Roos, A332,
teemu.roos at cs.helsinki.fi
 Teaching assistant
 Anupam Arohi,
anupam.arohi at cs.helsinki.fi
Language: All course material and lectures will be in
English.
Lectures: Period I: 8.9.–16.10. Tue & Fri 10–12
in C222 (except 25.9. in CK112). All the lectures will be recorded
on video and made available the following day.
Exercises: 18.9.–16.10. Fri 12–14 in C222.
Final exam: Tue 20.10. 912AM in B123
The course is followed by a related project in Period II. It is
recommended that you take both the course and the project. (More
information will be posted later.)
Prerequisites:
 Elementary calculus (limits and convergence, convexity)
 Basic probability (random variables, conditional and joint
distributions, expectation)
 Good programming skills (no specific language required)
Contents
The schedule below is tentative and subject to change.
If you don't have a highspeed connection, you may prefer
the downloadable
versions of the videos.
 Lecture 1. (Tue 8.9.)
 Introduction.
participation obligatory
• introduction to the course •
overview of contents
slides
video
exercises #1
 Lecture 2. (Fri 11.9.)
 Noisy Channel Coding.
• error correcting codes • noisy channel coding theorem
• Hamming codes
slides
video
 Lecture 3. (Tue 15.9.)
 Mathematical Preliminaries.
• calculus • probability •
law of large numbers • Jensen's and
Gibbs' inequalities
slides
video
exercises #2 
files
 Lecture 4. (Fri 18.9.)
 Source Coding: Theory.
• entropy and information • asymptotic equipartition
property (AEP) • noiseless source coding theorem
slides
video
 Lecture 5. (Tue 22.9.)
 Source Coding: Practice.
• symbol codes
• Kraft inequality • entropy coding
slides
video
exercises #3
 Lecture 6. (Fri 25.9.)
 Special Guest Lecture: Daniel Schmidt
• Minimum Message Length (MML) principle
slides
video
 Lecture 7. (Tue 29.9.)
 Source Coding: Practice (contd.).
• ShannonFano code • Huffman code
• arithmetic coding
slides
video
exercises #4
 Lecture 8. (Fri 2.10.)
 Universal Source Coding.
• twopart codes • mixture codes • normalized
maximum likelihood
slides
video
 Lecture 9. (Tue 6.10.)
 MDL Principle.
• Occam's razor • oldstyle & modern MDL
slides 
more slides
video
exercises #5 
data
 Lecture 10. (Fri 9.10.)
 MDL Principle (contd.).
• example applications
slides
video
 Lecture 11. (Tue 13.10.)
 Further Topics
• Kolmogorov complexity • gambling • lossy
coding • etc.
slides
video
 Lecture 12. (Fri 16.10.)
 Redundant Lecture
• summary of course contents • questions and answers before
final exam • introduction to project
video
Additional Material (by Lecture)
Here you can find recommended (but not required) additional material,
categorized according to (our) lectures.
 Lecture 2. Noisy Channel Coding

 Information and Entropy
 an MIT OpenCourseWare elementary level course •
for material by topic, click here
⇒ material on noise and errors
 Lecture 4. Source Coding: Theory

 Cover & Thomas, Elements of Information Theory, Chapter 2
 (see course folder, C127)
 Claude E. Shannon, "A mathematical theory of communication",
The Bell Systems Technical Journal, Vol. 27, pp.
379–423, 623–656, JulyOctober, 1948.
 ⇒ a reprint (with corrections) from Bell Labs
 Joaquin Candela, Probability, Information Theory and Bayesian Inference,
on videolectures.net
 • part 1: probability
 • parts 2 & 3: entropy and information
 • part 4: Bayesian inference
 Lectures 5 & 7. Source Coding: Practice

 Witten, Neal & Cleary, "Arithmetic coding for data compression", Communications of
the ACM, Vol. 30, No 6, pp. 520–540, 1987.
 Cover & Thomas, Elements of Information Theory, Chapter 5
 (see course folder, C127)
 Lelewer & Hirschberg, "Data compression",
ACM Computing Surveys,
Vol. 19, No. 3, pp. 261–296, 1987.
 Ken Huffman, "Huffman algorithm",
Huffman Coding Blog, Dec. 24, 2008.

a mostly informal blog entry written by David Huffman's nephew
&bull relates Huffman code to mobiles (no, not cell phones)
 David Solomon, Data Compression: The Complete Reference,
Springer, 2004, 926 pages
 the title says it all: a comprehensive reference;
more recent editions available
 Lecture 6. Minimum Message Length Principle

 Lloyd Allison, Minimum Message Length, web resource
 brief introduction to basic ideas • many useful links
 Lecture 8: Universal Source Coding

 Peter Grünwald, The Minimum Description Length
Principle, MIT Press, 2007, 504 pages
 • part 2: universal coding
 Lectures 9 & 10: MDL Principle

 Peter Grünwald, The Minimum Description Length
Principle, MIT Press, 2007, 504 pages
 • chapter 1: learning, regularity, and compression
⇒ PDF
available
 Peter Grünwald, MDL Tutorial, on videolectures.net
 Jorma Rissanen, Minimum Description Length Principle,
Scholarpedia, 3(8):6727.
 Kontkanen & Myllymäki, "A lineartime algorithm for computing the multinomial stochastic
complexity", Information Processing Letters, Vol. 103,
No. 6, pp. 227–233, 2007.
 a simple yet very useful result • quite complex proof
 Lecture 11: Further Topics

 Li & Vitanyi, Kolmogorov Complexity and Its Applications,
Springer, 1997.
 the book about Kolmogorov complexity
 John L. Kelly, Jr., "A new interpretation of information rate",
Bell Systems Technical Journal, Vol. 35, pp. 917–926,
1956.
 ⇒
PDF
available
 Video Coding
Fundamentals & H.264, blog
 •
image coding
• video coding



September 8 First lecture (C222); participation obligatory.
September 9 Video of first lecture and first set of exercises
available (see Contents below).
September 18 First exercises (C222).
September 25 We have a special guest lecturer, Dr.
Daniel
Schmidt, who talks about the Minimum Message Length (MML)
principle. Note changed room: CK112.
October 20, 912AM Final exam, B123.
November 20, 1620PM Separate exam in case you missed
or failed the first one, A111.

