File Selection Language Reference

Version 0.5.2 (2007-03-05)
Kristian Ovaska (kristian.ovaska [at] helsinki.fi)

Contents

1. Overview

File Selection Language (FSL) is a descriptive language for file selection. A FSL program, also called a rule set, is made out of rules. Each rule tells whether a file should or should not be included in the file set.

FSL rules utilize glob patterns. The pattern * matches all files, dir1/* matches all files under dir1, dir1/somefile matches only the file dir1/somefile, and so on. However, FSL rules are not limited to bare globs. See below for a full specification of the rules.

There are two basic kinds of rules: inclusive and exclusive. Inclusive rules tell that certain files should be included in the file set. When evaluating an inclusive rule, the file system is scanned for all files matching the rule.

Exclusive rule (starts with "NOT") is an exception to an inclusive rule. It says that even if a file matched an inclusive rule earlier, it must be excluded from the file set. Notice that exclusive rules don't cause any file system scanning by themselves: all scanning comes from inclusive rules.

If you exclude a directory with an exclusive rule, you exclude all files and subdirectories of it as well. This is a good way to speed up file scanning.

2. General syntax

The program consists of a list of rules, which are separated by newlines. Simple rules (usually) fit on one line, while complex ones span several lines.

The character # marks the beginning of a comment. Everything after it on the same line is ignored.

Inside so-called block rules (see below), the child rules must be indented with spaces or tabs. You can choose the indentation level, but you must always use the same amount within the rule file. Mixing spaces and tabs is unwise. Nested blocks are indented in the same manner.

Simple (non-block) rules generally fit on one line. Due to the block indentation system, simple rules normally even can't span several lines; it is a syntax error. However, expressions (see below) that have open parenthesis can span several lines freely. Python programmers will recognize the FSL indentation system.

Everything is case-insensitive: keywords and glob patterns.

3. Glob patterns

When you write glob patterns, you can use two forms: bare strings and quoted strings. Bare strings are written as-is, while quoted strings have quotation marks around them.

Bare string: dir1/*
Quoted string: "dir1/*"

There are limitations to bare strings. Bare strings:

For example, the pattern "aaa bbb" must be a quoted string.

Glob patterns may contain both forward slashes (/) and backward slashes (\). Forward slashes work on Windows, too, and backward slashes work on Unix. Glob patterns may contain full Windows drive specifiers (e.g. c:\somedir\*); they don't obviously work on Unix.

By default, glob patterns are recursive, i.e. * matches all files, including the subdirectories. You get nonrecursive behaviour by appending "NONREC" to the glob pattern. For example, * NONREC matches only the files in current root directory, but not in subdirectories.

There are two flavours of glob patterns: absolute and relative. Absolute patterns start with a forward or backward slash or a Windows drive specifier. A pattern that is not absolute is, logically, a relative pattern.

Relative glob patterns are evaluated in the context of a root directory. By default, the root directory is the current working directory, but may be set to any directory. For example, the rule * will produce all files in the file system if the root directory is the file system root, but only the files under /usr/local if the root directory is /usr/local. The root directory is given to the FSL interpreter as parameter. Also, so-called IN-blocks (see below) change the effective root directory temporarily. Absolute patterns are not allowed inside IN-blocks.

Usually, it is better to use relative globs, since they are more flexible than absolute globs. Absolute globs are always evaluated in the context of the same root directory, the file system root. Let's say you have created a relative rule-set for your Unix machine that you normally evaluate with / as the root directory. Some day, you mirror the file system into another Unix machine (or a Windows machine using Samba) into a directory /usr/somedir. Now, you can simply use your existing relative rule-set. This wouldn't be possible if you had hard-coded all the paths into the rules.

4. Rules

There are two basic rule types: glob list rule and for-each rule. Both may be prefixed with "NOT", which makes them exclusive rules. There is also two compound rule types: IN-block and IF-block.

<rule> := (NOT)? <glob-list>
        | (NOT)? <for-each>
        | IN <directory> <start-block> <rule>+ <end-block>
        | IF <expression> <start-block> <rule>+ <end-block>

4.1 Glob list rule

This is the most basic rule. Glob list rule is, as the name implies, a list of glob patterns separated by commas. A file matches a glob list rule if it matches any of the globs. In glob patterns, bare strings and quoted string may be mixed freely, as can recursive and nonrecursive (NONREC) glob patterns.

Format:

<glob-pattern> (, <glob-pattern>)* (IF <expression>)?

The IF-expression is optional. If present, the glob list rule is applied only if the expression is true. The expression is evaluated only once, not for every file. When using expressions, you usually want to evaluate the expression for every file in turn. In this case, you have to use the for-each rule.

Examples:

usr/local/*
somefile, "some file with spaces"
*.gif, *.jpg, *.png
NOT *.ps, *.eps
    (excludes both *.ps and *.eps)
*.html IF exists("index.html")
    (include *.html files only if index.html is present)

4.2 For-each rule

Format:

EACH <variable name> (IN <glob list>)? IF <expression>

For-each rule is an enchanced glob list. Each file matched by the glob list is included/excluded only if the expression matches. The expression is evaluated for every file in turn.

The IN-section is optional. If omitted, the glob * is used.

Examples:

EACH f IN * IF size(f) > 1024     (include files larger than 1 KB)
EACH f IF size(f) > 1024          (the same)
NOT EACH f IN *.ps IF date(f) < "2000" 
  (excludes *.ps files from the previous millennium)

4.3 IN-block

IN-block contains a list of rules that are executed in a different root directory. The effective root directory is calculated by concatenating the previous root directory and the directory given in the IN-block header.

The rules under the IN-block can be any rules: glob lists, for-each rules, or other IN-blocks.

Format:

IN <directory>
    <rule1>
    <rule2>
    ...

All glob patterns must be relative. The directory specifier may be absolute if the IN-block is a top-level IN-block. In a nested IN-block, the directory specifier must also be relative.

Example:

IN dir1
    *

This includes all files under dir1 and is exactly the same as the rule

dir1/*

Example of nested IN-blocks:

    IN dir1
        IN dir2
            IN dir3
                *

This matches the files dir1/dir2/dir3/*.

4.4 IF-block

IF-block is a bit like an glob list rule with an IF-expression, but an IF-block may contain several rules. The rules are applied only if the expression evaluates to true. The expression is evaluated only once.

Format:

IF <expression>
    <rule1>
    <rule2>
    ...

5. Expressions

5.1 General

Expressions are used in for-each rules, glob list rules and IF-block rules to determine whether a rule should be applied. Each expression evaluates to true or false.

Expressions are made of:

Timestamp literals are written inside quotation marks just like strings. However, they are converted to a "real" timestamp representation internally. An invalid timestamp literal results in a parse error. Accepted timestamp formats are:

Notice that logical NOT (inside an expression) is conceptually different from the exclusion NOT before a rule. Logical NOT merely reverses the truth value of an expression.

Expression format:

<expression> := <simple-expression> ((AND | OR) <expression>)?
<simple-expression> := (NOT)? <atom> (<compare-op> <atom>)?
                     | (NOT)? "(" <expression> ")"
<atom> := <string>
        | <number>
        | <variable-name>
        | <function-name> "(" <atom> ")"
<compare-op> := "<" | "<=" | ">" | ">=" | "=" | "!="

Expressions with open parenthesis can span several lines, unlike normal simple FSL rules. Inside the expression, indentation doesn't matter.

Example:

EACH f IN *.txt IF (size(f) < 1000
                  OR age(f) < 30)

The following example would be a syntax error because there are no (open) parenthesis:

EACH f IN *.txt IF size(f) < 1000
                 OR age(f) < 30

5.2 Built-in functions

Expressions can use built-in functions, which can be divided into two categories: predicate and value functions. Predicate functions return a truth value and value functions return a value (like a number). Value function calls can't be used as complete expressions as themselves, they must be combined with comparison operators.

Function Type Description
age(filename) filename -> float Return age of file in days as floating point number (based on modification date)
base(filename) filename -> filename Return file name without (outermost) extension, e.g. for filename "dir/aaa.ext", return "dir/aaa"
date(filename) filename -> datetime Return modification date of file
exists(filename) filename -> boolean Return true if given file exists
extract(time, part) datetime, string -> int Extract part from timestamp. Part is one of "year", "month", "day", "hour", "minute", "second", "week", "weekday".
now() -> datetime Return current time
size(filename) filename -> int Return size of file in bytes

6. Rule evaluation order

Rules are evaluated from the first to the last and the last matching rule is applied.

For example, consider the following rules:

*
NOT *.jpg

The first rule matches all files, but the second rule excludes all *.jpg files. As the result, all files except *.jpg files are included in the file set.

A rule set that may not do what one wishes it to do:

NOT *.jpg
*

This matches all files, including *.jpg files, because the last rule (*) tells to include all files. No exclusive rule at the beginning of a rule set ever has any effect. Indeed, the FSL interpreter warns you in this case: Warning: exclusive rule at beginning - has no effect.

Usually, you should have inclusion rules at the beginning of the rule set and exclusion rules at the end.


Up: FSL index
Updated 2007-03-05