Algorithms in Molecular Biology

Syventävät opinnot
The course covers selected algorithm design areas that have grown around molecular biology problems, including perfect phylogenies, genome rearrangements, and motif discovery, and some selected molecular biology problems in transcriptomics and genomics, that exploit reductions to graph problems. Prerequisites: basics of bioinformatics and algorithms. No course exam. Assessment will be by exercises. Alternatively one can take a separate exam. Course follows partly a textbook: Mäkinen, Belazzougui, Cunial, Tomescu. Genome-Scale Algorithm Design. Cambridge University Press, 2015.


05.05.2015 16.00 A111
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2015 kevät 09.03-30.04. 4-4 Englanti Veli Mäkinen


Aika Huone Luennoija Päivämäärä
Ma 12-14 C222 Veli Mäkinen 09.03.2015-30.04.2015
To 10-12 C222 Veli Mäkinen 09.03.2015-30.04.2015


Group: 1
Aika Huone Ohjaaja Päivämäärä Huomioitavaa
Ma 10-12 C222 Anna Kuosmanen 16.03.2015—27.04.2015


The course covers selected algorithm design areas that have grown around molecular biology problems. Topics include fragment assembly, transcriptomics and metagenomics data analysis, haplotyping, and evolution-related algorithmics such as haplotyping, motif discovery, and perfect phylogeny.

Kurssin suorittaminen

The course consists of lectures, study groups, and exercises, as follows:

First week

  • Mon 12-14 - First lecture
  • Thu 10-12 - Study group

Other weeks

  • Mon 10-12 - Exercise session on the previous week's topics
  • Mon 12-14 - Lecture
  • Thu 10-12 - Study group

Lectures end with a list of topics and their assignments to the students. The assigned topic is to be studied carefully before gathering to the study group meeting on Thursday, with the fellow students studying the same topic, to form a common understanding of the assigned topic. Then the groups will be mixed so that each group has an expert from each topic, and the rest of the Thursday's study group is devoted to teaching each others the material adopted. The exercise session tests the understanding of all topics.

There will be an exam giving 48 points at the maximum. Active participation to exercises gives at the maximum 12 points (20%->1p,80%->12p, linear scale). You should attend 5/7 study group sessions and get at least 1p from the exercise session to enter the exam. The grading is then based on the maximum of a) the sum of points from exercises (max 12) and exam (max 48) and b) the exam points scaled to 60: ~30 points -> 1 and ~50 points -> 5, depending on the difficulty of the exam.


  • Mon 9.3 12-14. Introduction to course content + Fragment assembly I (lecture on de Bruijn graphs, contig assembly). Sections 13-13.2.2.
  • Thu 12.3 10-12.  Fragment assembly II (study group on scaffolding and gap filling). Sections 13.3-13.4.
  • Mon 16.3 10-12. Exercise 1 [solutions]
  • Mon 16.3 12-14. Transcriptomics I (lecture on splicing graphs, annotated transcript expression prediction, differential expression analysis). Sections 15-15.1.
  • Thu 19.3 10 -12. Transcriptomics II (study group on gene alignment, co-linear chaining). Sections 6.5 and 15.4.
  • Mon 23.3 10-12. Exercise 2 [solutions]
  • Mon 23.3 12-14. Transcriptomics III (lecture on path covers in splicing graphs). Sections 5.4.2, 15.2-15.2.2.
  • Thu 26.3 10-12. Transcriptomics IV (study group on simultaneous transcript assembly and expression estimation). Section 15.3.
  • Mon 30.3 10-12. Exercise 3. [solutions]
  • Mon 30.3 12-14. Haplotype assembly (lecture). Section 14.3.
  • Eastern break 2.-8.4.
  • Thu 9.4 10-12. Metagenomics (study group on multi-read and coverage-sensitive methods). Sections 5.3.3 and16.1.2.
  • Mon 13.4 10-12. Exercise 4. [solutions]
  • Mon 13.4 12-14 Sequence motifs I (lecture). Jones & Pevzner Chapter 12
  • Thu 16.4 10-12 Sequence motifs  II (study group on rigid patterns, maximality, density, quorum, basis). Parida Section 6.4.
  • Mon 20.4 10-12. Exercise 5. [solutions]
  • Mon 20.4 12-14. NO LECTURE
  • Thu 23.4 10-12. Permutation patterns I (lecture on Parikh Mapping / naming technique, the intervals problem). Parida Sections 10.4-10.5.1, 13.5.1.
  • Mon 27.4 10-12. NO EXERCISE
  • Mon 27.4 12-14. Permutation patterns II (study group on Uno-Yagiura RC algorithm). Parida Section 10.5.2.
  • Thu 30.4 12-14. Exercise 6. [solutions]
  • Check course exam date here:
  • Check separate exam dates here:
  • There is a project course following in the next period with hands-on training on fragment assembly algorithms.


Kirjallisuus ja materiaali

The first half follows the book

  • Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. Genome-Scale Algorithms Design: Biological sequence analysis in the era of high-throughput sequencing. Cambridge University Press. In press.

The latter half uses material from the books

  • Neil C. Jones and Pavel A. Pevzner. An Introduction to Bionformatics Algorithms. The MIT Press, 2004.
  • Laxmi Parida. Pattern Discovery in Bioinformatics: Theory & Algorithms. Chapman & Hall, CRC, 2008.