University of Helsinki Department of Computer Science

Department of Computer Science

Department information


Using sgrep for querying structured text files

Jani Jaakkola, and Pekka Kilpeläinen: Using sgrep for querying structured text files. Report C-1996-83, Department of Computer Science, University of Helsinki, November 1996. 11 pages. <>

Full paper: gzip'ed Postscript file
Metadata: XML file


Sgrep is a Unix tool for searching the contents of text files. Sgrep implements an algebra of unrestricted text fragments called regions. The algebra allows the retrieval of document components, represented as regions, based on conditions on their relative containment and ordering. This simple yet powerful model is suitable for querying structured document formats like electronic mail, RTF, LaTeX, HTML, or SGML documents. We describe the sgrep query language and give examples of its use. Especially, we explain how sgrep can be used for querying and assembling SGML documents.

Index Terms

Categories and Subject Descriptors:
H.2.3 [Database management]: Languages
H.3.3 [Information storage and retrieval]: Information search and retrieval
I.7.2 [Text processing]: Document Preparation

General Terms: Design, Languages

Additional Key Words and Phrases: text seach tools, structured documents, SGML

Online Publications of Department of Computer Science, Anna Pienimäki