Sgrep home page


What is sgrep ?

sgrep (structured grep) is a tool for searching and indexing text, SGML,XML and HTML files and filtering text streams using structural criteria. The data model of sgrep is based on regions, which are nonempty substrings of text. Regions are typically occurrences of constant strings, SGML-tags, or meaningful text elements, which are recognizable through some delimiting strings or the builtin SGML, XML and HTML parser. Regions can be arbitrarily long, arbitrarily overlapping, and arbitrarily nested.

Sgrep is a convenient tool for making queries to almost any kind of text files with some well kown structure. These include programs, mail folders, news folders, HTML, SGML, etc... With relatively simple queries you can display mail messages by their subject or sender, extract titles or links or any regions from HTML files, function prototypes from C or make complex queries to SGML files based on the DTD of the file.

NEW! Third prerelease of sgrep-2 is out!

Sgrep version 1.92a is out. This version contains the sources, Win32 binary and binaries for HP-UX, Linux, OSF1 and Solaris. See the download page. The Win32 binary also includes the m4 macro processor.

Version 1.92 also fixes a fatal bug in sgrep-1.91, which caused version 1.91 to core dump when searching without using the SGML-scanner.

Major new features since 1.90a are:

Major new features in 1.90a since version 1.70 are: Major new features since version 0.99 are: See the README file for details.

How is sgrep used

Sgrep queries are constructed with it's own language. The details of the language are covered on the sgrep manual page. See also the report using sgrep for querying structured text files. With the query language you can express queries like:

The new features in sgrep-1.90a, including indexing, are currently documented only in the README file.

The power of sgrep query language is at its best when making complex queries on SGML like tagged documents. See a set of example queries including the queries above.

The most recent stable sgrep version is 0.99. See the announcement of version 0.99

The most recent alpha version is 1.91a. See the announcement of version 1.91a

Sgrep requirements

Sgrep-1.91a works in Win32 systems (Win95, Win98 and Windows NT) as a console application or in any decent unix-like system supporting memory mapped files.

Sgrep from the net

Authors

Sgrep was made by

Jani Jaakkola, email:jjaakkol@cs.helsinki.fi
Pekka Kilpeläinen, email: Pekka.Kilpelainen@helsinki.fi


Last modified: Dec 22,1998

This document is maintained by Jani Jaakkola
at email address jjaakkol@cs.helsinki.fi