Project in Data Compression Techniques

Algoritmit ja koneoppiminen
Syventävät opinnot
Implementation and experimental comparison of data compression algorithms and/or design and analysis of new compressed data structures.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2012 kevät 13.03-24.04. 4-4 Englanti Veli Mäkinen


Aika Huone Luennoija Päivämäärä
Ti 14-16 B119 Veli Mäkinen 13.03.2012-24.04.2012
To 10-14 C220 Veli Mäkinen 26.04.2012-26.04.2012


The project consists of

  • implementation of one or more data compression algorithms
  • experimental comparison and/or analysis of the algorithm(s)
  • presentation of the results

The project can be done in groups of at most four students. In a group each student is responsible for specific algorithms and the group together is responsible for the experiments and the presentation.

Course assumes Data Compression Techniques -course or similar knowledge as background. Topics include general compression methods and  compressed data structures. In addition, there will be some special topics related to high-throughput sequencing tailored for bioinformatics students.

  • List of topics.
  • Other topics related to the Data Compression Techniques -course are also possible.


  • MH: topic 1 (adaptive Huffman coding)
  • JK & PM: topic 2 (arithmetic coding & PPM)
  • JL & MR & MY: topic 4 (compressed graph representations)
  • HA: topic 5 (dictionary compression & random text generation)
  • AK & JY: topic 6 (DNA read set compression)
  • AH: topic 7 (tuning Golinski's rank & select index)
  • DK: topics 8,9 (compression of LPF tables, repetitive data and fixed block compression)
  • ZL: own topic (compressed representation for grid points)


Kurssin suorittaminen

Algorithm implementation

The algorithms can be implemented with any programming language under the restriction that the programs can be compiled and executed on the department computers.

The algorithm implementations are returned to the instructor by noon on Fri 20.4. The contributions of each group member should be stated clearly.


The purpose of the experiments is to determine how the compression efficiency changes depending on inputs. Also running time of compression / decompression, and peak space usage is of interest.


Each group gives a presentation of the chosen algorithms, implementations, and  results of the experiments in the end of the project. This presentation session will be open to other students and staff of the department.

The presentation session takes place Thursday 26.4 at 10-14 C220.


Each part of the project (implementation, experiments, presentation) contributes one third to the total score. In general,
the experiment and presentation score will be the same for all members of a group. The implementation scores will be personal.

Important Dates

Guidance every week Tuesdays 14-16, B119.

The implementation (with documentation) must be returned by noon on Friday 20.4.

The project presentation takes place in on Thursday 26.4 at 10-14 C220.

  • 10.15 HS: "Distributed Suffix Array Construction" (string processsing project)
  • 10.35 PM & JK: "Engineering adaptive data compression"
  • 10.55 ZL: "2-Dimensional Static Grid Compression Using Blobbing"
  • 11.15 HA: "Random text generation via dictionary-based decompression"
  • 11.35 DK: "Fixed Block Compression of Repetitive Data and Read Data"
  • 11.55 AK & JY: "Readzip"


Kirjallisuus ja materiaali