# String Processing Algorithms

58093
5
Algoritmit ja koneoppiminen
Syventävät opinnot
Basic algorithms and data structures for string processing: exact and approximate string matching, string sorting, dictionary data structures, text indexing.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2010 syksy 02.11-09.12. 2-2 Englanti

## Luennot

Aika Huone Luennoija Päivämäärä
Ti 12-14 B222 Juha Kärkkäinen 02.11.2010-09.12.2010
To 12-14 B222 Juha Kärkkäinen 02.11.2010-09.12.2010

## Harjoitusryhmät

Group: 1
Aika Huone Ohjaaja Päivämäärä Huomioitavaa
To 14-16 B119 Juha Kärkkäinen 08.11.2010—10.12.2010
Group: 2
Aika Huone Ohjaaja Päivämäärä Huomioitavaa
To 16-18 BK107 Juha Kärkkäinen 08.11.2010—10.12.2010

Registration for this course starts on Tuesday 12th of October at 9.00.

## Yleistä

The course introduces basic algorithms and data structures for string processing including: exact and approximate string matching, string sorting, dictionary data structures and text indexing.

Together with Project in String Processing Algorithms (Period III) this course is one of the three optional course pairs in the subprogram of Algorithms and Machine Learning.

The course is also useful for students in the Master's degree program for Bioinformatics, particularly for those interested in biological sequence analysis.

## Kurssin suorittaminen

The course consists of lectures, exercises and an exam.

Grading: 1-10 points for exercises + 25-50 points for exam = 30-60 points in total. To pass you need the minimum in each category.

Results

Exam: Problems | Solutions

Separate exams

04.02.2011: Problems | Solutions

05.04.2011: Problems | Solutions

## Kirjallisuus ja materiaali

Lecture notes

All lecture notes: PDF / PS (4 slides/page). Erratum: Algorithm 4.22: Corrected lines (3) and (4).

1. Lecture 02.11. (pages 1-20): PDF / PS (4 slides/page).   Erratum: Definition of border in Definition 0.1 changed to: If x is both a prefix and a suffix of w, then x is a border of w.
2. Lecture 04.11. (pages 21-35): PDF / PS (4 slides/page)
3. Lecture 09.11. (pages 36-50): PDF / PS (4 slides/page)
4. Lecture 11.11. (pages 51-73): PDF / PS (4 slides/page)
5. Lecture 16.11. (pages 74-92): PDF / PS (4 slides/page)
6. Lecture 18.11. (pages 93-106): PDF / PS (4 slides/page)
7. Lecture 23.11. (pages 107-126): PDF / PS (4 slides/page)  Changes: added explanation on page 116 and small corrections on pages 122,123 and 126.
8. Lecture 25.11. (pages 127-135): PDF / PS (4 slides/page) Erratum: Algorithm 3.34, line (8): comparison direction corrected to "rlcp > RLCP[mid]"
9. Lecture 30.11. (pages 136-151): PDF / PS (4 slides/page)
10. Lecture 02.12. (pages 152-164): PDF / PS (4 slides/page) Erratum: Lemma 4.9: "SA" changed to "SA-1"
11. Lecture 07.12. (pages 165-180): PDF / PS (4 slides/page) Erratum: Algorithm 4.22: Corrected lines (3) and (4).
12. Lecture 09.12. (pages 181-185): PDF / PS (4 slides/page)

Exercises

1. Exercise 11.11. Problems | Solutions
2. Exercise 18.11. Problems | Solutions
3. Exercise 25.11. Problems | Solutions
4. Exercise 02.12. Problems | Solutions
5. Exercise 09.12. Problems | Solutions

Related books and articles

• M. Crochemore and W. Rytter. Jewels of Stringology. World Scientific Publishing, 2002.
• D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
• G. Navarro and M. Raffinot. Flexible Pattern Matching in Strings. Cambridge University Press, 2002.
• B. Smyth. Computing Patterns in Strings. Addison Wesley, 2003.
• G. Navarro.  A Guided tour to approximate string matching. ACM Computing Surveys 33(1), 2001, pp. 31 - 88.
• S.J. Puglisi, W.F. Smyth, A.H. Turpin. A Taxonomy of Suffix Array Construction Algorithms. ACM Computing Surveys 39(2), Article 4, 2007.

Links