## First Huffman, then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index

*Szymon Grabowski, Veli Mäkinen, and Gonzalo Navarro*

**Abstract.**
We design a succinct full-text index based on the idea of Huffman-compressing
the text and then applying the Burrows-Wheeler transform over it. The resulting
structure can be searched as an FM-index, with the benefit of removing the sharp
dependence on the alphabet size, *c*, present in that structure. On a text
of length *n* with zero-order entropy *H_0*, our index needs
*O(n(H_0+1))* bits
of space, without any dependence on *c*. The average search time for a
pattern of length *m* is *O(m(H_0+1))*, under reasonable assumptions. Each
position of a text occurrence can be reported in worst case time
*O((H_0+1) log n)*, while any text substring of length *L* can be retrieved in
*O((H_0+1)L)* average time in addition to the previous worst case time. By
paying *2n* additional bits of space, it is possible to maintain the average
complexities and ensure that *H_0+1* becomes *log c* in the worst case
for all the time complexities. Our index provides a relevant space/time
tradeoff between existing succinct data structures, with the additional
interest of being easy to implement.