Department of Computer Science

SuDS project / Compressed suffix tree implementation

News

Our C++-implementation of compressed suffix tree is available for download (version 1.1).

See the preprint of a Bioinformatics application note for a brief overview and some experimental results.

Background

Our implementation of compressed suffix tree follows closely a recent article by Sadakane (Compressed Suffix Trees with Full Functionality, Theory of Computing Systems, in press). The implementation supports all typical suffix tree operations, including suffix links. In addition, it supports lowest common ancestor (lca) queries. Some operations work in constant time (traversing in the tree, lca), others are slower than normal suffix tree operations by about a logarithm factor. The space-savings are significant: on a 10MB DNA sequence, the compressed suffix tree was taking about 33MB. A normal suffix tree takes easily 160MB (and this does not yet support lca-queries). Even a plain suffix array takes more space than the compressed suffix tree. For more details on the implementation, see the manuscript describing the algorithm engineering challenges faced during the project. For a brief overview, see the poster, and for some preliminary experiments, see the WEA 2007 article.

Download

Version 1.1 - Fixed some compilation errors when using a more recent GNU C/C++.

Back to SuDS project page

Previous update: 21.10.2013 - Veli Mäkinen