Seminar on Data Compression

Algorithms and machine learning
Advanced studies
The seminar covers selected topics in data compression. No prior knowledge on data compression is required but students are expected to have basic knowledge on algorithms, data structures, and algorithm analysis.
Year Semester Date Period Language In charge
2017 spring 16.01-01.05. 3-4 English Juha Kärkkäinen


Time Room Lecturer Date
Mon 14-16 C220 Juha Kärkkäinen 16.01.2017-27.02.2017
Mon 14-16 C220 Juha Kärkkäinen 13.03.2017-24.04.2017


NB: The first meeting of this seminar course will be on Monday, 30.01.2017 at 14:00 in room C220.

Lempel-Ziv (LZ) parsing (discovered 40 years ago) is one of the most important algorithmic tools in data compression, both in theory and in the real world. 

In this seminar we will look under the hood of several powerful, LZ-style data compressors, such as LZ4, 7zip, and snappy. These compressors are currently in wide use both by users and as part of larger software systems in industry. The aim is to examine different aspects of the compression process, including: what variants of the parsing are used in real compressors; how fast LZ parsing is carried out in practice; how the parse is encoding; how fast decompression is achieved; use of parallelism.

Completing the course

Mon 30.1.2017

  1. Summarize the performance (compression and decompression speed and compression ratio) of your assigned compressors in the Squash benchmark. Note that the benchmark includes many different files and multiple machines, and some compressors have multiple modes/levels. Is the compressor exceptionally good at something?
  2. Take a brief look at the code and the documentation of each assigned compressors. How easy or difficult it would be to find out exactly what the compressor does and how it achieves it's performance?

Mon 6.2.2017: Informal meeting for discussion and questions. Not mandatory.

Mon 13.2.2017: Mandatory attendance.

  • Oral reports on Assignment 1
  • Assignment 2: Each student selects one of the compressors (or groups of compressors) from the "Compressors for Assignment 2" list below or an article about LZ compression and prepares a report and a presentation on the compressor/article. Send your three preferences (compressors and/or articles) to Juha and Simon by Friday, 17.02

Mon 20.2.: Informal meeting for discussion and questions. Not mandatory.

Mon 27.2.: Informal meeting for discussion and questions. Not mandatory.

Mon 6.3.: Informal meeting for discussion and questions. Not mandatory.

Mon 13.3.: Presentations. Mandatory attendance.

  • Tuukka Paukkunen: Density
  • Peter Goetsch: Pithy & Snappy

Mon 20.3.: Presentations. Mandatory attendance.

  • Jiri Hamberg: wflz
  • Simo Salmirinne: Brotli
  • Yan He: LZHAM
  • Jasu Viding: briefLZ, yalz77, LZJB

Mon 27.3.: Presentations. Mandatory attendance.

  • Matti Pulli: Gipfeli
  • Pekka Väänänen: lz4
  • Olavi Lintumäki: lz4

Mon 3.4.: Informal meeting for discussion and questions. Not mandatory.

Mon 10.4.: Deadline for reports on compressors.
Presentations. Mandatory attendance.

  • Mikko Määttä: The use of asymmetric numeral systems as an accurate replacement for Huffman coding
  • Kalle Lammenoja: Massively-Parallel Lossless Data Decompression
  • Timo Mäki: A Fast Implementation of Deflate

Mon 17.4.: Easter. No meeting.

Mon 24.4.: Presentations. Mandatory attendance.

  • Bella Zhukova: Bicriteria data compression: efficient and usable
  • Joonas Nietosvaara: Effective Construction of Relative Lempel-Ziv Dictionaries

Mon 1.5.: Deadline for reports on articles.


Literature and material

Compressors for Assignment 1:

  1. BriefLZ, LZ4, Zstandard
  3. Brotli, Gipfeli, LZMA
  4. FastLZ, LZG, zlib-ng
  5. Density, Pithy, CRUSH
  6. LZJB, ms-compress, Snappy
  7. LZF, QuickLZ, wfLZ
  8. FastLZ, LZO, yalz77
  9. LZF, LZMA, Zstandard
  10. LZ4, QuickLZ, Snappy
  11. Brotli, Gipfeli, wfLZ
  12. CRUSH, LZG, zlib-ng
  13. Density, LZHAM, Pithy
  14. BriefLZ, ms-compress, CSC
  15. LZ4, LZJB, yalz77

Compressors for Assignment 2:

  1. Density
  2. Pithy and Snappy (two related compressors)
  3. LZHAM
  4. Gipfeli
  5. Brotli
  6. wflz
  7. zlib-NG
  8. CSC
  9. LZ4 (possible to work with a partner)
  10. LZMA (possible to work with a partner)
  11. briefLZ, yalz77, LZJB (three simple compressors)
  12. Z-standard (possible to work with a partner)

Articles for Assignment 2:

  • Danny Harnik, Ety Khaitzin, Dmitry Sotnikov, Shai Taharlev:
    A Fast Implementation of Deflate.
    DCC 2014: 223-232.
  • Jarek Duda, Khalid Tahboub, Neeraj J Gadgil, Edward J. Delp:
    The use of asymmetric numeral systems as an accurate replacement for Huffman coding.
    Picture Coding Symposium (PCS), 2015, 64-69.
  • Evangelia A. Sitaridi, René Müller, Tim Kaldewey, Guy M. Lohman, Kenneth A. Ross:
    Massively-Parallel Lossless Data Decompression.
    45th International Conference on Parallel Processing (ICPP), 2016, 242-247
  • Kewen Liao, Matthias Petri, Alistair Moffat, Anthony Wirth:
    Effective Construction of Relative Lempel-Ziv Dictionaries.
    25th International Conference on World Wide Web (WWW), 2016, 807-816
  • Andrea Farruggia, Paolo Ferragina, Rossano Venturini:
    Bicriteria Data Compression: Efficient and Usable.
    22nd Annual European Symposium on Algorithms (ESA), 2014, 406-417