University homepage Suomenkielinen versio puuttuu Inte på svenska In english
University of Helsinki Department of Computer Science
 

Department of Computer Science

Succinct Data Structures (SuDS) -research group

Group description

The research group studies a new subfield of data compression - data structure compression. The new aspect compared to traditional compression is that the compressed data (structure) needs to be represented so that access to its internal parts is provided without uncompressing the whole structure. As an example, consider a binary tree of n nodes. It is possible to represent the tree succinctly using about 2n bits so that the children and parent of any node can be accessed in constant time. A standard link structure representation of a binary tree takes of order n log n bits.

In addition to providing new algorithms and data structures in the field of study, the group contributes by engineering open source implementations targeted to applications such as sequence analysis in Bioinformatics, full-text search in Database Systems, and retrieval of structured documents in Information Retrieval.

News

  • Try out our Genome Browser - a visual interface to the suffix tree of Human Genome running on a server machine. It demonstrates the capabilities of suffix tree concept (and compressed data structures as its practical implementation) for analyzing and searching genome sequences. Current version features some basic functionalities as a proof of concept. More features will be added gradually, including user-guide (just try out all mouse buttons for now to explore its functionalities). SUDS Genome Browser

People

Visitors

  • Susana Ladra, PhD student, University of A Coruna, Spain (August 2009 -)
  • Antti Laaksonen, Research Assistant
  • (summer 2009)
  • Kashyap Dixit, MSc student, IIT Kanpur, India (May 2006 - July 2006)
  • Wolfgang Gerlach, Diploma student, Bielefeld University, Germany (August 2006 - September 2006)
  • Johannes Fischer, PhD student, LMU München, Germany (June 2007)

Collaboration

The project collaborates with several researchers abroad. A close and long-term collaboration is with Professor Gonzalo Navarro from University of Chile.

Software

Recent Publications

  • Diego Arroyuelo, Francisco Claude, Sebastian Maneth, Veli Mäkinen, Gonzalo Navarro, Kim Nguyên, Jouni Sirén, and Niko Välimäki: Fast In-Memory XPath Search over Compressed Text and Tree Indexes.
    Accepted to ICDE 2010, Long Beach, California, USA, March 1-6, 2010.
  • Jouni Sirén: Compressed Suffix Arrays for Massive Data.
    In SPIRE 2009, Springer LNCS 5721, pp. 63-74, Saariselkä, Finland, August 25-27, 2009.
  • Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki. Storage and Retrieval of Individual Genomes.
    In Proc. 13th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2009), Springer-Verlag LNCS 5541, pp. 121-137, Tucson, Arizona, May 18-21, 2009.
    [Article online]
    [Preprint]
  • Johannes Fischer, Veli Mäkinen, and Niko Välimäki: Space-Efficient String Mining under Frequency Constraints.
    In Proc. Eighth IEEE International Conference on Data Mining (ICDM 2008), IEEE Computer Society, pages 193-202, Pisa, Italy, December 15-19, 2008.
    [Article online]
    [Preprint]
  • Jouni Siren, Niko Välimäki, Veli Mäkinen, and Gonzalo Navarro: Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections.
    In Proc. 15th Symposium on String Processing and Information Retrieval (SPIRE 2008), Springer-Verlag LNCS 5280, pp. 164-175, Melbourne, Australia, November 10-12, 2008.
    [Article online] [Preprint]
  • Johannes Fischer, Veli Mäkinen and Gonzalo Navarro: An(other) Entropy-Bounded Compressed Suffix Tree.
    In Proc. Annual Symposium on Combinatorial Pattern Matching (CPM 2008), Springer-Verlag LNCS 5029, pages 152-165, Pisa, Italy, June 18-20, 2008.
    [Article online]
    [Preprint]
  • Veli Mäkinen and Gonzalo Navarro: On Self-indexing Images - Image Compression with Added Value.
    In Proc. Data Compression Conference (DCC 2008), IEEE Computer Society, pages 422-431, Snowbird, Utah, USA, March 25-27, 2008.
    [Article online]
    [Preprint]
  • Veli Mäkinen and Gonzalo Navarro.
    Implicit Compression Boosting with Applications to Self-Indexing,
    In Proc. 14th Symposium on String Processing and Information Retrieval (SPIRE 2007), Santiago, Chile, October 29-31, 2007.
    [Preprint]
  • Niko Välimäki and Veli Mäkinen.
    Space-efficient Algorithms for Document Retrieval,
    In Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPM 2007),
    Springer-Verlag LNCS 4580, pp. 205-215, London, Ontario, Canada, July 9-11, 2007.
  • Niko Välimäki, Wolfgang Gerlach, Kashyap Dixit, and Veli Mäkinen.
    Engineering a Compressed Suffix Tree Implementation,
    In Proc. 6th Workshop on Experimental Algorithms (WEA 2007),
    Springer-Verlag LNCS 4525, pp. 217-228, June 6-9, Rome, Italy.
    Earlier as a technical report: [Pdf]
  • Niko Välimäki, Wolfgang Gerlach, Kashyap Dixit, and Veli Mäkinen.
    Compressed Suffix Tree - A Basis for Genome-scale Sequence Analysis,
    Bioinformatics, 23:629-630, 2007.
    [Preprint]
  • Veli Mäkinen and Gonzalo Navarro.
    Dynamic Entropy-Compressed Sequences and Full-Text Indexes.
    In Proc. 17th Annual Symposium on Combinatorial Pattern Matching (CPM 2006), Springer-Verlag LNCS 4009, pp. 306-317, Barcelona, Spain, July 5-7, 2006.
    [Abstract] [Pdf]
    (Extended version to appear in ACM Transactions on Algorithms, see [Preprint])
  • Gonzalo Navarro and Veli Mäkinen.
    Compressed Full-Text Indexes.
    ACM Computing Surveys, Vol. 39, No. 1, Article 2, 2007.
    [Preprint]
  • Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro.
    Compressed Representations of Sequences and Full-Text Indexes.
    ACM Transactions on Algorithms, Vol. 3, Issue 2, Article 20, May 2007
    [Abstract] [Pdf]
  • Veli Mäkinen and Gonzalo Navarro.
    Succinct Suffix Arrays based on Run-Length Encoding.
    Nordic Journal of Computing, 12(1):40-66, 2005.
    [Abstract] [Bibtex] [PostScript]

Funding

Previous update: 23.10.2009 - Veli Mäkinen