Project in String Processing Algorithms

Algorithms and machine learning
Advanced studies
Implementation and experimental comparison of string processing algorithms, and presentation of the results.


Year Semester Date Period Language In charge
2011 spring 18.01-22.02. 3-3 English Juha Kärkkäinen


Time Room Lecturer Date
Tue 12-14 B119 Juha Kärkkäinen 18.01.2011-22.02.2011


The project consists of

  • implementation of one or more string processing algorithms
  • experimental comparison and/or analysis of the algorithm(s)
  • presentation of the results as a poster

The project can be done in groups of at most four students. In a group each student is responsible for specific algorithms and the group together is responsible for the experiments and the poster.

Suitable topics include:

  • exact string matching
  • multiple exact string matching
  • approximate string matching
  • string sorting
  • search trees for strings

Other topics are possible too.

Completing the course

Algorithm implementation

The algorithms can be implemented with any programming language under the restriction that the programs can be compiled and executed on the department computers. Members of a group should use the same language.

The algorithm implementations are returned to the instructor by noon on Fri 18.2.. In a group, each student returns her or his implementations separately. See the opening slides below for more details.


The purpose of the experiments is to determine how the performance of the algorithms changes with different inputs, different parameters settings, different algorithms etc.. An important part is choosing the test data.


The results of the experiments are presented as a poster. There will be an open poster presentation session where other students and staff of the department can come to view the posters and ask questions.

The poster may be A0 or A1size split into A3s or A4s (see example LaTeX poster below) or collection of separate A3s or A4s. Poster boards and pins are provided. The boards are large enough to hold an A0 in either orientation.

The poster session takes place Wed 2.3. at 11-15 in B222. Preliminary program:

  • 11-12 Assembling poster boards and setting up the posters
  • 12-13 Lunch break and viewing other posters
  • 13-14 Poster session open to public
  • 14-15 Taking down posters


Each part of the project (implementation, experiments, poster) contributes one third to the total score. In general,
the experiment and poster score will be the same for all members of a group. The implementation scores will be personal.

Important Dates

The implementation (with documentation) must be returned by noon on Friday 18.2.

The project presentation takes place in on Wednesday 2.3. at 11-15 in B222.



The three columns "Työt" correspond to the three parts of the project, implementation, experiments and poster. The maximum for each part is 12 points.

Literature and material

Opening slides: PDF | PS (4 slides/page)

Example poster: example_poster.tgz (unpack with tar xvzf example_poster.tgz and see the file README)


Test data

Pizza&Chili Corpus and Repetitive Corpus