University of Helsinki Department of Computer Science
 

Department of Computer Science

Department information

 

Querying String Databases with Modal Logic

Matti Nykänen: Querying String Databases with Modal Logic. PhD Thesis, Report A-1997-3, Department of Computer Science, University of Helsinki, November 1997. 151 pages. <http://www.cs.helsinki.fi/TR/A-1997/3>

Full paper: gzip'ed Postscript file
Errdata gzip'ed Postscript file
Metadata: XML file

Abstract

New application areas for databases demand the possibility of managing not only the traditional atomic but also structured data types. One type of this kind is a finite sequence of characters drawn from a finite alphabet. These string databases are important for example in molecular biology data management, because they can for instance represent DNA sequences directly as strings from alphabet {A,C,G,T}. Then the query language must be able to manipulate strings both as indivisible entities and as ordered sequences of distinct characters, perform pattern matching in strings, compare several strings with each other, and generate new strings not yet in the database. This work presents Alignment Calculus, a new modal logic extension of the relational calculus, which satisfies all these requirements with a new unified string manipulation formalism. This formalism is based on the concept of multiple alignment of several strings; computational molecular biology employs an analogous concept. In the language, alignments are described logically with string formulae, a new extension of linear-time temporal logic into multiple strings. The abstract expressive power of Alignment Calculus is shown to equal the Arithmetical Hierarchy. The work develops also a syntactically safe sublanguage, whose queries require only finite computational resources. This sublanguage is constructed by first identifying a decidable subcase of string formula safety analysis, even though the general problem is shown to be undecidable. This safe sublanguage is shown to equal the Polynomial-time Hierarchy in its expressive power, and therefore to capture all the string queries occurring in practical situations. The operational counterpart of Alignment Calculus, Alignment Algebra, is developed by replacing the selection operator of relational algebra with one employing multitape nondeterministic finite state automata corresponding to the aforementioned string formulae, and adding an explicit domain symbol. The aforementioned safe sublanguage has also a counterpart in this algebra: expressions, in which all the domain symbol occurrences are restricted by immediately enclosing automata. A finite evaluation strategy is developed for these expressions.

Index Terms

Categories and Subject Descriptors:
F.1.1 [Computation by Abstract Devices] Models of Computation - Automata
F.4.3 [Mathematical Logic and Formal Languages] Formal Languages - Classes Defined by Grammars or Automata
H.2.3 [Database Management] Languages - Query Languages

General Terms: Automata, Logic, Databases

Additional Key Words and Phrases: multiple automata, modal logic, string databases, computational biology


Online Publications of Department of Computer Science, Anna Pienimäki