The Document Management Group at the Department of Computer Science of University of Helsinki, Finland, proudly presents --------------------------------------------------------------------------- SGREP v0.99 - A tool for searching files for structured patterns --------------------------------------------------------------------------- INTRODUCTION ------------ If you have ever wondered how to o Locate only TITLE and H1 .. H9 elements from HTML documents o Remove all tags from an HTML document o Rename all B elements to STRONG elements o Find out how many FIG elements there are under SUBPARA elements but not under PARA elements in your SGML file o Print out the TITLE elements from a set of HTML documents in which word 'SGML' is mentioned more than 12 times, or which contain word SGML inside H1 or H2 elements. o Find out mail senders of mail messages from a set of mail files, which contain word 'SGML' in the subject line, do not contain 'HTML' in the body of the mail, are sent in year 1996 and are not sent from address flame@hot.com then sgrep is a tool for you. Sgrep (structured grep) is a tool for searching text files and filtering text streams for structured criteria. Sgrep implements a query language based on so called region expressions. Like grep, sgrep can be used for any kind of text files. However it is most useful for text files containing some kind of structured text. A file containing structured text could be defined as a file, which obeys some syntax. Examples of structured text files are SGML, HTML, C, Tex and mail files. ENVIRONMENT ----------- Sgrep needs a Unix-like system to run. It has been tested on the following platforms: SunOS 5.4 sparc Linux 1.3.85 alpha Linux 1.2.13 intel, a.out binaries Linux 1.2.13 intel, elf binaries HP-UX 9000/735 OSF1 alpha It has been reported to run also on SGI/Irix 5.2 A macro preprocessor is most useful as a front-end to sgrep. The authors use m4, and the delivery package contains example macro files written for m4. However, a C-preprocessor or some other program could also be used instead of m4. COPYRIGHT --------- Sgrep is distributed under the GNU General Public License. WHERE CAN I FIND IT ? --------------------- We have put up some WWW-pages on sgrep at http://www.cs.helsinki.fi/~jjaakkol/sgrep.html In the WWW-pages you will also find the queries, which solve the problems above. Source for sgrep can be downloaded from ftp://ftp.cs.helsinki.fi/pub/Software/Local Sorry, there are no binary distributions (yet). Send mail to jjaakkol@cs.helsinki.fi, if you have a problem, which you cannot solve yourself. CREDITS ------- Sgrep was created by Jani Jaakkola (jjaakkol@cs.helsinki.fi) and Pekka Kilpeläinen (kilpelai@cs.helsinki.fi). We wish to thank professor Heikki Mannila for suggesting us to design and implement sgrep. Sgrep is based upon the paper "An algebra for structured text search and framework for its implementation" by C. L. A. Clarke, G. V. Cormack and F. J. Burkowski. The Computer Journal, 38(1):43-56, 1995. A preliminary version of their paper is available from ftp://cs-archive.uwaterloo.ca/cs-archive/CS-94-30 However, sgrep is not a strict implementation of the language of Clarke, Cormack and Burkowski. Unlike their language, sgrep is able to deal with nested regions, e.g., lists within lists (within lists ..). Enjoy !