## An Alphabet-Friendly FM-Index

*Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro*

**Abstract.**
We show that, by combining an existing compression boosting technique
with the wavelet tree data structure, we are able to design a
variant of the *FM-Index* which scales well with the size of the input
alphabet *A*. The size of the new index built on a string *T[1,n]* is
bounded by *n
H_k(T) + O((n log log n)/ log_{|A|} n)* bits, where
*H_k(T)* is the *k*-th order empirical entropy of *T*.
The above bound holds simultaneously for
all *k<=a log_{|A|} n* and *0< a < 1*. Moreover,
the index does not depend on the parameter *k*, which plays a role
only in analysis of the space occupancy.
Using our index, counting the occurrences of an arbitrary pattern *P[1,p]* as
a substring of *T* takes *O(p log |A|)* time. Locating each pattern occurrence
takes *O(log |A| log^{1+e} n)* time.
Reporting a text substring of length *l* takes
*O((l + log^{1+e} n) log |A|)* time.