String Processing Algorithms

Algorithms and machine learning
Advanced studies
Basic algorithms and data structures for string processing: exact and approximate string matching, string sorting, dictionary data structures, text indexing.


13.12.2012 16.00 B123
Year Semester Date Period Language In charge
2012 autumn 30.10-04.12. 2-2 English Juha Kärkkäinen


Time Room Lecturer Date
Tue 12-14 B222 Juha Kärkkäinen 30.10.2012-04.12.2012
Thu 12-14 B222 Juha Kärkkäinen 30.10.2012-04.12.2012
Mon 12-14 D122 Juha Kärkkäinen 03.12.2012-03.12.2012

Exercise groups

Group: 1
Time Room Instructor Date Observe
Thu 10-12 B119 Dominik Kempa 29.10.2012—07.12.2012
Wed 10-12 B222 Dominik Kempa 05.12.2012—05.12.2012
Group: 2
Time Room Instructor Date Observe
Thu 14-16 DK116 Dominik Kempa 29.10.2012—07.12.2012

Registration for this course starts on Tuesday 9th of October at 9.00.



Course Exam 13.12.2012

Renewal/Separate Exam 08.02.2013

Separate Exam 20.09.2013


The course introduces basic algorithms and data structures for string processing including: exact and approximate string matching, string sorting, dictionary data structures and text indexing.

Together with Project in String Processing Algorithms (Period III) this course is one of the three elective course pairs in the subprogram of Algorithms and Machine Learning.

The course is also useful for students in the Master's degree program for Bioinformatics, particularly for those interested in biological sequence analysis.

The students are expected to have basic knowledge on algorithms, data structures, finite automata and algorithm analysis. Recommened prequisite course are Data Structures, Models of Computation, and Design and Analysis of Algorithms.

Completing the course

The course consists of lectures, exercises and an exam.

Lectures: Tue and Thu 12-14 in B222.

Attending lectures is not obligatory but it is useful. Lecture notes covering key facts will be posted on this page, but there will be additional examples and explanations during the lectures.

Exercises: Thu 10-12 in B119 and Thu 14-16 in DK116.

Exercise problems will be available on this page about a week before each exercise session. Model solutions will be available after the exercise session.
The students should solve the problems at home and be prepared to present their solutions at the exercise session. The students are not required to solve all problems, but additional points are awarded according to how many problems have been solved:

  • 20% marked problems is required and gives 1 point.
  • 80% marked problems gives the maximum of 10 points.

Exam: Thu 13.12. at 16:00.

The course exam consists of problems similar to those in the exercises. No notes or other material is allowed in the exam.

Passing the course or raising the grade is possible later at a renewal exam (where the exercise points affect the grade as with the course exam) and at separate exams (where exercise points are ignored).

See last year's course for examples of exam problems.


The grading is based on the points from the exercises (max. 10 points) and the exam (max. 50 points).

  • 30 points is required to pass and gives the lowest grade 1.
  • 50 points gives the highest grade 5.

Literature and material

All essential material can be found in the lecture notes and in the exercise problems and solutions that will be posted here during the course.

The course will be similar (but not necessarily identical) to last year's course.

Lecture notes

  1. Lecture 30.10. (pages 1-16): PDF | PS (4 slides/page) - Change: Added Example 1.9 and moved slides 17-26 to Lecture 2.
  2. Lecture 01.11. (pages 17-42): PDF | PS (4 slides/page) - Errata: Wrong Theorem numbers on pages 29 and 35 (now corrected). Wrong leaf path compated trie size on page 21 (now corrected).
  3. Lecture 06.11. (pages 43-57): PDF | PS (4 slides/page)
  4. Lecture 06.11. (pages 58-72): PDF | PS (4 slides/page)
  5. Lecture 13.11. (pages 73-88): PDF | PS (4 slides/page)
  6. Lecture 15.11. (pages 89-96): PDF | PS (4 slides/page)
  7. Lecture 20.11. (pages 97-117): PDF | PS (4 slides/page)
  8. Lecture 22.11. (pages 118-137): PDF | PS (4 slides/page)
  9. Lecture 27.11. (pages 138-162): PDF | PS (4 slides/page) - Erratum: Undefined i in CreateNode on slide 144 (now corrected).
  10. Lecture 29.11. (pages 163-173): PDF | PS (4 slides/page)
  11. Lecture 03.12. (pages 174-190): PDF | PS (4 slides/page)
  12. Lecture 04.12. (pages 191-206): PDF | PS (4 slides/page)

Exercises and solutions

  1. Exercise 1 (01.11.) Problems
  2. Exercise 2 (08.11.) Problems
  3. Exercise 3 (15.11.) Problems
  4. Exercise 4 (22.11.) Problems
  5. Exercise 5 (29.11.) Problems
  6. Exercise 6 (05.12.) Problems

Practice problems

Related books and articles

The course does not follow any book. The books can be useful supplementary material but are not required reading.

  • M. Crochemore, C. Hancart and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007.
  • M. Crochemore and W. Rytter. Jewels of Stringology. World Scientific Publishing, 2002.
  • D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
  • G. Navarro and M. Raffinot. Flexible Pattern Matching in Strings. Cambridge University Press, 2002.
  • B. Smyth. Computing Patterns in Strings. Addison Wesley, 2003.
  • G. Navarro.  A Guided tour to approximate string matching. ACM Computing Surveys 33(1), 2001, pp. 31 - 88.
  • S.J. Puglisi, W.F. Smyth, A.H. Turpin. A Taxonomy of Suffix Array Construction Algorithms. ACM Computing Surveys 39(2), Article 4, 2007.