<?xml version="1.0" encoding="ISO-8859-1"?>
<report no="A-1992-6"
  title="Tree Matching Problems with Applications to Structured Text Databases"
  date="November 1992"
  pages="110"
  genterms="Algorithms"
  keywords="structured text databases, tree pattern matching, dynamic programming, NP-completeness"
  issn=""
  isbn="951-45-6257-7">
<author name="Pekka Kilpeläinen"/>
<phd/>
<class name="F.2.2, G.2.2, H.2.3, H.3.3"/>
<file url="A-1992-6.ps.gz"/>
<abstract>
<p>
Tree matching is concerned with finding the instances,
or matches, of a given pattern tree in a given target
tree. We introduce ten interrelated matching problems
called tree inclusion problems. A specific tree inclusion
problem is defined by specifying the trees that are
instances of the patterns. The problems differ from each
other in the amount of similarity required between the
patterns and their instances. We present and analyze
algorithms for solving these problems, and show that
the computational complexities of the problems range
from linear to NP-complete. The problems are
motivated by the study of query languages for
structured text databases. The structure of a text
document can be described by a context-free grammar,
and text collections can be represented as collections
of parse trees. Matching-based operations are an
intuitive basis for accessing the contents of structured
text databases. In ``G-grammatical''  tree inclusion
problems the target tree is a parse tree over a
context-free grammar G. We show that a certain
natural class of grammars allows solving some of the
grammatical inclusion problems in linear time. Tree
inclusion problems are extended by introducing logical
variables in the patterns. These variables allow
extracting substructures of the pattern matches and
posing equality constraints on them. We show that most
of the tree inclusion problems with logical variables are
NP-hard, and also consider solving their polynomial
versions. As an application of these problems we finally
show how tree inclusion with logical variables can be
used for querying structured text databases, and discuss
how the inclusion queries should be evaluated in
practice.
</p>
</abstract>
</report>

