Trade off Between Compression and Search Times in Compact Suffix Array

Veli Mäkinen

vmakinen@cs.Helsinki.FI
Department of Computer Science, P.O Box 26 (Teollisuuskatu 23)
FIN-00014 University of Helsinki, Finland.

Abstract. Suffix array is a widely used full-text index that allows fast searches on the text. It is constructed by sorting all suffixes of the text in the lexicographic order and storing pointers to the suffixes in this order. Binary search is used for fast searches on the suffix array. Compact suffix array is a compressed form of the suffix array that still allows binary searches, but the search times are also dependent on the compression. In this paper, we answer some open questions concerning the compact suffix array, and study practical issues, such as the trade off between compression and search times, and show how to reduce the space requirement of the construction. Experimental results are provided in comparison with other search methods. The results show that usually the size of a compact suffix array is less than twice the size of the text, while the search times are still comparable to those of suffix arrays.