###
Position-Restricted Substring Searching

####
Veli Mäkinen and Gonzalo Navarro

A full-text index is a data structure built over a text string *T[1,n]*.
The most basic functionality provided is *(a)* counting how many times a
pattern string *P[1,m]* appears in *T* and *(b)* locating all
those *occ* positions. There exist several indexes that solve *(a)*
in *O(m)* time and *(b)* in *O(occ)* time. In this paper we
propose two new queries, *(c)* counting how many times *P[1,m]*
appears in *T[l,r]* and *(d)* locating all those *occ(l,r)*
positions. These can be solved using *(a)* and *(b)* but this
requires *O(occ)* time. We present two solutions to *(c)* and
*(d)* in this paper. The first is an index that requires *O(n*log n)*
bits of space and answers *(c)* in *O(m + log n)* time and *(d)*
in *O(log n)* time per occurrence (that is, *O(occ(l,r)*log n)* time
overall). A variant of the first solution answers *(c)* in
*O(m + log log n)* time and *(d)* in constant time per occurrence,
but requires *O(n*log^(1+e) n)* bits of space for any constant *e>0*.
The second solution requires *n*m*log s (1+o(1))* bits of space,
solving *(c)* in *O(m*ceil(log s / log log n))* time and *(d)*
in *O(m*ceil(log s / log log n))* time per occurrence, where *s* is
the alphabet size. This second structure takes less space when the text is
compressible.
Our solutions can be seen as a generalization of *rank* and *select*
dictionaries, which allow computing how many times a given character *c*
appears in a prefix *T[1,i]* and also locate the *i*-th occurrence of
*c* in *T*. Our solution to *(c)* extends character *rank*
queries to *substring rank* queries, and our solution to *(d)*
extends character *select* to *substring select* queries.

As a byproduct, we show how *rank* queries can be used to implement
fractional cascading in little space, so as to obtain a succinct version of a
well-known two-dimensional range search data structure by Chazelle. We also
show how Grossi et al.'s *wavelet trees* are suitable for two-dimensional
range searching, and their connection with Chazelle's data structure.