Tree-languages project work topics

More than one person/group can pick the same topic, if desired. We encourage you to do the project work in pairs, and there are more topics available for pair work than individual work. You may also propose a topic of your own.

Technical details about programming projects are available. Links to the references are at the end of this page.

You should complete the milestones one-by-one (the order is approximate). If you do not complete all of them, the grades will depend on the number and relevance of milestones completed. Your report should reflect the milestones completed, lessons learned from them and reasons for possible incomplete milestones.

If you at any point during the project get stuck with a milestone, ask for help from the teachers. The projects are meant to be a learning experience, not just measurement

This is not a software development project. You do not need to be overly concerned about resource usage, design documentation, error recovery and such. You do need to have a clear, modular and general implementation of the issues handled on the course (like the automata). You should, however, implement your program so that it can print out a human-readable description of internal datastructures related to the grammar and automata, so that it is easier to see what is going on when debugging..

1. Tree-automata recognizer (programming)

Scope

Individual

Description

Write a program that takes as input a regular tree grammar and an XML document and decides whether the tree represented by the document belongs to the language of the grammar. The implementation shall do this via construction of non-deterministic push-down tree automata.

Milestones

References

The handout from the introductory lecture, Neumanns thesis, Binary Queries.

1B. Tree-automata recognizer in pure C (programming)

Otherwise exactly the same as project work 1, but the implementation should be in pure C, running on top of the Expat parser and portable to Symbian (minimal resource use, programming style the same as in Expat). Contact Mika Raento and Renaud Petit (petit@cs.helsinki.fi) if you are interested in this.

References

References of 1, plus Expat.

2. Derivatives-based recognizer (programming)

Scope

Individual

Description

Write a program that takes as input a regular tree grammar and an XML document and decides whether the tree represented by the document belongs to the language of the grammar. The implementation shall do this by using derivatives of regular expressions.

Milestones

References

The handout from the introductory lecture, the original derivatives of regular expressions paper, documentation of Clark's Relax-NG validator.

3. Algorithm for intersection and difference of grammars (theoretical)

Scope

Individual

Description

Describe in sufficient detail for implementation an algorithm that can calculate the intersection and difference of two forest-regular grammars. No implementation necessary.

References

The handout from the introductory lecture, TBD

4. Tree-automata query implementation (programming)

Scope

Pair

Description

Implement project work 1, plus implement querying of the document via marked non-terminals in the grammar. (unary queries only).

Milestones

References

The handout from the introductory lecture, Neumanns thesis, Binary Queries.

5. Extension of recognizer for attributes (programming)

Scope

Pair

Description

Implement project work 1 or 2, plus implement specification of attributes in a grammar and recognizer. Attributes should be specifiable as unordered regular expressions (?, ( ), |, & connectors but no sequences or repetitions).

Milestones

References

The handout from the introductory lecture, Neumanns thesis, Binary Queries.

6. Lazy construction of deterministic LPA (programming)

Scope

Pair

Description

Implement project work 1, plus implement the lazy construction of a deterministic LPA.

Milestones

References

The handout from the introductory lecture, Neumanns thesis, Binary Queries.

References