# String Processing Algorithms

## Exam

Year | Semester | Date | Period | Language | In charge |
---|---|---|---|---|---|

2013 | autumn | 29.10-05.12. | 2-2 | English | Juha Kärkkäinen |

## Lectures

Time | Room | Lecturer | Date |
---|---|---|---|

Tue 12-14 | B222 | Juha Kärkkäinen | 29.10.2013-05.12.2013 |

Thu 12-14 | B222 | Juha Kärkkäinen | 29.10.2013-05.12.2013 |

## Exercise groups

Time | Room | Instructor | Date | Observe |
---|---|---|---|---|

Tue 14-16 | B222 | Dominik Kempa | 28.10.2013—06.12.2013 |

Registration for this course starts on Tuesday 8th of October at 9.00.

## General

**RESULTS**

**Course Exam 12.12.2014**

- Problems | solutions
- Registered exercises
- The results are available in TIKLI
- You can see your exam papers and get feedback on Mon, 13.01.2013 at 13:30-14:15 in room B214.

**Renewal/Separate Exam 07.02.2014**

- Problems | solutions
- The results are available in TIKLI
- Grade is based on Yht = max(KoeS+LhP, KoeS*1.2)

**FINAL NOTES**

- Please give feedback for the course. You can use the anonymous feedback form or give direct feedback.
- Please check that the number of solved exercises have been registered correctly.
- Don't forget the Project in String Processing Algorithms in the next period. Another related course in the next period is Biological Sequence Analysis.

**NEWS: **Guest speaker on the lecture on Tuesday 19.11. at 13:15 in B222: Prof. Maxime Crochemore (King's College London): Tracking down repeats in strings.

The course introduces basic algorithms and data structures for string processing including: exact and approximate string matching, string sorting, dictionary data structures and text indexing.

Together with Project in String Processing Algorithms (Period III) this course is one of the three elective course pairs in the subprogram of Algorithms and Machine Learning.

The course is also useful for students in the Master's degree program for Bioinformatics, particularly for those interested in biological sequence analysis.

The students are expected to have basic knowledge on algorithms, data structures, finite automata and algorithm analysis. Recommened prequisite course are Data Structures, Models of Computation, and Design and Analysis of Algorithms.

## Completing the course

The course consists of lectures, exercises and an exam.

**Lectures**: Tue and Thu 12-14 in B222.

Attending lectures is not obligatory but it is useful. Lecture notes covering key facts will be posted on this page, but there will be additional examples and explanations during the lectures.

**Exercises**: Tue 14-16 in B222. (*NOTE: First exercise session is already on the first week, 29.10.* *The problems are already available.*)

Exercise problems will be available on this page about a week before each exercise session. Model solutions will be available after the exercise session.

The students should solve the problems at home and be prepared to present their solutions at the exercise session. The students are not required to solve all problems, but additional points are awarded according to how many problems have been solved:

- 20% marked problems is required and gives 1 point.
- 80% marked problems gives the maximum of 10 points.

**Exam**: Thu 12.12. at 16:00. (*Preliminary, please check the time and place a few days before the exam.*)

The course exam consists of problems similar to those in the exercises. No notes or other material is allowed in the exam.

Passing the course or raising the grade is possible later at a renewal exam (where the exercise points affect the grade as with the course exam) and at separate exams (where exercise points are ignored).

See last year's course for examples of exam problems.

**Grading**

The grading is based on the points from the exercises (max. 10 points) and the exam (max. 50 points).

- 30 points is required to pass and gives the lowest grade 1.
- 50 points gives the highest grade 5.

## Literature and material

All essential material can be found in the lecture notes and in the exercise problems and solutions that will be posted here during the course.

The course will be similar (but not necessarily identical) to last year's course.

**Lecture notes**

- Lecture 29.10. (pages 1-18) PDF | PDF (8 slides/page) Errata: Errors in Fibonacci strings F
_{5}-F_{8}and the De Bruijn graph example have been corrected. - Lecture 31.10. (pages 19-38) PDF | PDF (8 slides/page)
**Change**: Notations for sum of set lcp values have changed, see Definitions 1.6 and 1.7 (pages 20-21). - Lecture 05.11. (pages 39-60) PDF | PDF (8 slides/page)
- Lecture 07.11. (pages 61-80) PDF | PDF (8 slides/page)
- Lecture 12.11. (pages 81-94) PDF | PDF (8 slides/page) Change: Pages 93-94 moved to next lecture with small additional comment at the end.
- Lecture 14.11. (pages 93-108) PDF | PDF (8 slides/page)
- Lecture 19.11. (pages 109-119) PDF | PDF (8 slides/page) Guest speaker at 13:15: Prof. Maxime Crochemore (King's College London): Tracking down repeats in strings.
- Lecture 21.11. (pages 120-144) PDF | PDF (8 slides/page)
- Lecture 26.11. (pages 145-167) PDF | PDF (8 slides/page)
- Lecture 28.11. (pages 168-180) PDF | PDF (8 slides/page)
- Lecture 03.12. (pages 181-201) PDF | PDF (8 slides/page)
- Lecture 05.12. (pages 202-220) PDF | PDF (8 slides/page)

**Exercise problems and solutions**

- Exercises 29.10.: Problems
- Exercises 05.11.: Problems
- Exercises 12.11.: Problems
- Exercises 19.11.: Problems
- Exercises 26.11.: Problems
- Exercises 03.12.: Problems

Practice exercises: Problems

**Books**

The course does not follow any book. The books and articles below can be useful supplementary material but are not required reading.

- M. Crochemore, C. Hancart and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007.
- M. Crochemore and W. Rytter. Jewels of Stringology. World Scientific Publishing, 2002.
- D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
- E. Ohlebusch. Bioinformatics Algorithms: Sequence Analysis, Genome Rearrangements, and Phylogenetic Reconstruction, 2013.
- G. Navarro and M. Raffinot. Flexible Pattern Matching in Strings. Cambridge University Press, 2002.
- B. Smyth. Computing Patterns in Strings. Addison Wesley, 2003.

**Survey articles**

- S. Faro and T. Lecroq. The exact online string matching problem: A review of the most recent results.
*ACM Computing Surveys*45, 2, Article 13 (March 2013), 42 pages. - G. Navarro. A Guided tour to approximate string matching. ACM Computing Surveys 33(1), 2001, pp. 31 - 88.
- S.J. Puglisi, W.F. Smyth, A.H. Turpin. A Taxonomy of Suffix Array Construction Algorithms. ACM Computing Surveys 39(2), Article 4, 2007.
- G. Navarro and V. Mäkinen. Compressed full-text indexes. ACM Computing Surveys 39(1), Article 2, 2007.

**Links**

- Exact String Matching Algorithms
- SMART: String Matching Research Tool
- Pizza & Chili
- Yuta Mori's implementations of suffix array construction algorithms: SAIS and divsufsort