<?xml version="1.0" encoding="ISO-8859-1"?>
<report no="C-1999-2"
  title="Nested Text-Region Algebra"
  date="January 1999"
  pages="24"
  genterms="Algorithms, Languages"
  keywords="text searching, structured documents, SGML, XML"
  issn=""
  isbn="">
<author name="Jani Jaakkola"/>
<author name="Pekka Kilpeläinen"/>
<class name="H.2.3 [Database management]: Languages"/>
<class name="H.3.3 [Information storage and retrieval]: Information search and retrieval"/>
<class name="I.7.1 [Document and text processing]: Document and Text Editing"/>
<file url="C-1999-2.ps.gz"/>
<abstract>
<p>
So called region algebras operating on sets of text fragments have been
proposed and implemented as query languages for text documents.
Text documents often comprise nested regions like lists within lists or
procedures within procedures. 
Earlier versions of region algebra do not support 
querying nested regions.
We address this deficiency by proposing a new, unrestricted region algebra.
The new algebra allows dynamic definition of nested regions. This makes
it suitable for querying without any preprocessing documents, whose
hierarchical structure is indicated by embedded markup.
We demonstrate that this nested region algebra can be efficiently 
implemented, by presenting and analyzing algorithms for its operations.
Especially, we show that any fixed nested region algebra expression on 
text of length <i>n</i> can be evaluated in the worst case in
time <i>O(n<sup>2</sup>)</i>, and in practice in linear time.
Nested region algebra has been implemented in a publicly available Unix
text search tool called 
<a href="http://www.cs.Helsinki.FI/~jjaakkol/sgrep.html">sgrep</a>.
</p>
</abstract>
</report>

