GCSA
GCSA [2] is a compressed suffix array for certain finite languages. The implementation currently only supports alphabet ACGT.
See README in the package for further information.
The implementation is available for download under the MIT / X11 License. Our implementation of RLCSA [1] is required for compiling GCSA.
Return to the SuDS homepage.
News
- 2012-02-14 Simpler and faster index. More space-efficient construction.
- 2011-08-23 Vastly improved construction algorithm.
- 2011-01-17 A new version that is significantly faster, and supports approximate searching.
- 2010-10-13 The implementation of GCSA is now available.
Downloads
- Current version (February 2012).
- February 2012. A simpler version of GCSA that is theoretically 2x (instead of 3x) slower than a regular CSA. Construction uses in-place sorting that is more space-efficient but slightly slower. Requires February 2012 version of RLCSA.
- August 2011. This implementation was used in the full version of [2]. Contains a vastly improved construction algorithm. Requires August 2011 version of RLCSA.
- January 2011. A new improved version with faster basic operations and support for approximate searching. This version was used in [2]. Requires January 2011 version of RLCSA.
- October 2010. The original version. Works with RLCSA (October 2010).
References
-
Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, and Niko Välimäki:
Storage and Retrieval of Highly Repetitive Sequence Collections.
Journal of Computational Biology 17(3):281-308, 2010.
-
Jouni Sirén, Niko Välimäki, and Veli Mäkinen:
Indexing Finite Language Representation of Population Genotypes.
Proc. WABI 2011, Springer LNCS 6833, pp. 270-281, Saarbrücken, Germany, September 5-7, 2011.
Jouni.Siren@cs.helsinki.fi

