Dynamic Entropy-Compressed Sequences and Full-Text Indexes.

Veli Mäkinen and Gonzalo Navarro

Given a sequence of n bits with binary zero-order entropy H_0, we present a dynamic data structure that requires nH_0 + o(n) bits of space, which is able of performing rank and select, as well as inserting and deleting bits at arbitrary positions, in O(log n) worst-case time. This extends previous results by Hon et al. [ISAAC 2003] achieving O(log n/ loglog n) time for rank and select but O(polylog(n)) amortized time for inserting and deleting bits, and requiring n+o(n) bits of space; and by Raman et al. [SODA 2002] which have constant query time but a static structure. In particular, our result becomes the first entropy-bound dynamic data structure for rank and select over bit sequences. We then show how the above result can be used to build a dynamic full-text self-index for a collection of texts over an alphabet of size c, of overall length n and zero-order entropy H_0. The index requires nH_0 + o(n log c) bits of space, and can count the number of occurrences of a pattern of length m in time O(m log n log c). Reporting the occ occurrences can be supported in O(occ log^2 n log c) time, paying O(n) extra space. Insertion of text to the collection takes O(log n log c) time per symbol, which becomes O(log^2 n log c) for deletions. This improves a previous result by Chan et al. [CPM 2004]. As a consequence, we obtain an O(n log n log c) time construction algorithm for a compressed self-index requiring nH_0 + o(n log c) bits working space during construction.