String Processing Algorithms

58093
5
Algorithms and machine learning
Advanced studies
Basic algorithms and data structures for string processing: exact and approximate string matching, string sorting, dictionary data structures, text indexing.
Year Semester Date Period Language In charge
2010 autumn 02.11-09.12. 2-2 English

Lectures

Time Room Lecturer Date
Tue 12-14 B222 Juha Kärkkäinen 02.11.2010-09.12.2010
Thu 12-14 B222 Juha Kärkkäinen 02.11.2010-09.12.2010

Exercise groups

Group: 1
Time Room Instructor Date Observe
Thu 14-16 B119 Juha Kärkkäinen 08.11.2010—10.12.2010
Group: 2
Time Room Instructor Date Observe
Thu 16-18 BK107 Juha Kärkkäinen 08.11.2010—10.12.2010

Registration for this course starts on Tuesday 12th of October at 9.00.

General

The course introduces basic algorithms and data structures for string processing including: exact and approximate string matching, string sorting, dictionary data structures and text indexing.

Together with Project in String Processing Algorithms (Period III) this course is one of the three optional course pairs in the subprogram of Algorithms and Machine Learning.

The course is also useful for students in the Master's degree program for Bioinformatics, particularly for those interested in biological sequence analysis.

Completing the course

The course consists of lectures, exercises and an exam.

Grading: 1-10 points for exercises + 25-50 points for exam = 30-60 points in total. To pass you need the minimum in each category.

Results

Exam: Problems | Solutions

Separate exams

04.02.2011: Problems | Solutions

 05.04.2011: Problems | Solutions

Literature and material

Lecture notes

All lecture notes: PDF / PS (4 slides/page). Erratum: Algorithm 4.22: Corrected lines (3) and (4).

  1. Lecture 02.11. (pages 1-20): PDF / PS (4 slides/page).   Erratum: Definition of border in Definition 0.1 changed to: If x is both a prefix and a suffix of w, then x is a border of w.
  2. Lecture 04.11. (pages 21-35): PDF / PS (4 slides/page)
  3. Lecture 09.11. (pages 36-50): PDF / PS (4 slides/page)
  4. Lecture 11.11. (pages 51-73): PDF / PS (4 slides/page)
  5. Lecture 16.11. (pages 74-92): PDF / PS (4 slides/page)
  6. Lecture 18.11. (pages 93-106): PDF / PS (4 slides/page)
  7. Lecture 23.11. (pages 107-126): PDF / PS (4 slides/page)  Changes: added explanation on page 116 and small corrections on pages 122,123 and 126.
  8. Lecture 25.11. (pages 127-135): PDF / PS (4 slides/page) Erratum: Algorithm 3.34, line (8): comparison direction corrected to "rlcp > RLCP[mid]"
  9. Lecture 30.11. (pages 136-151): PDF / PS (4 slides/page)
  10. Lecture 02.12. (pages 152-164): PDF / PS (4 slides/page) Erratum: Lemma 4.9: "SA" changed to "SA-1"
  11. Lecture 07.12. (pages 165-180): PDF / PS (4 slides/page) Erratum: Algorithm 4.22: Corrected lines (3) and (4).
  12. Lecture 09.12. (pages 181-185): PDF / PS (4 slides/page)

Exercises

  1. Exercise 11.11. Problems | Solutions
  2. Exercise 18.11. Problems | Solutions
  3. Exercise 25.11. Problems | Solutions
  4. Exercise 02.12. Problems | Solutions
  5. Exercise 09.12. Problems | Solutions

Practice problems | Solutions

Related books and articles

  • M. Crochemore and W. Rytter. Jewels of Stringology. World Scientific Publishing, 2002.
  • D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
  • G. Navarro and M. Raffinot. Flexible Pattern Matching in Strings. Cambridge University Press, 2002.
  • B. Smyth. Computing Patterns in Strings. Addison Wesley, 2003.
  • G. Navarro.  A Guided tour to approximate string matching. ACM Computing Surveys 33(1), 2001, pp. 31 - 88.
  • S.J. Puglisi, W.F. Smyth, A.H. Turpin. A Taxonomy of Suffix Array Construction Algorithms. ACM Computing Surveys 39(2), Article 4, 2007.

Links