Biological Sequence Analysis
Vuosi | Lukukausi | Päivämäärä | Periodi | Kieli | Vastuuhenkilö |
---|---|---|---|---|---|
2016 | kevät | 18.01-03.03. | 3-3 | Englanti | Esko Ukkonen |
Luennot
Aika | Huone | Luennoija | Päivämäärä |
---|---|---|---|
Ma 12-14 | B119 | Esko Ukkonen | 18.01.2016-03.03.2016 |
To 10-12 | B119 | Esko Ukkonen | 18.01.2016-03.03.2016 |
Harjoitusryhmät
Aika | Huone | Ohjaaja | Päivämäärä | Huomioitavaa |
---|---|---|---|---|
Ma 14-16 | B222 | Dominik Kempa | 25.01.2016—04.03.2016 | The meeting time may still change! |
Yleistä
The course covers selected computational methods for the analysis of biological sequences. Topics include advanced alignment methods, algorithms around hidden Markov models, and core data structures for read alignment and genome analysis. The course will be organized in *active learning* mode meaning that there will be very few lectures but more home work and study groups during the course. In study groups we discuss the week's topic. It is essential that the students read in advance the texts for each study group session (the exact form of study group work depends on the number of participants). Exercises test the knowledge of the study group material and their extensions to related topics. There will be some tailored assignments depending on the student's background: choice between deeper theory assignments for mathematically oriented and more software implementation-oriented assignments for those who prefer learning by doing.There will be no course exam. Instead, you have to show sufficient activity during the course to pass.
Kurssin suorittaminen
There is no course exam. The grading is based on the activity during the course. The study groups are mandatory (you should attend at least 4 out of 6). Exercises determine the grade: 50% gives 1, 85% gives 5. If you cannot attend some exercise session you may return your solutions before the session by email to Dominik Kempa at dominik.kempa[at]gmail.com
Content and weekly schedule
- Mon18.1. 12-14: Opening lecture: Bioinformatics? Biology primer, orthologs and paralogs, sequence homology, alignments, score schemes, log-odds, BLOSUM, multiple alignments, signals (motifs) in DNA, PWM and PSSM, k-mers, GC-content, GC-skew, Markov-chains, CAI. [slides.ppt] [slides.pdf]
- Thu 21.1. 10-12: Opening lect (cont)
- Mon 25.1. 12-14: Study group: Dynamic programming for various alignment models + shortest detour. Sections 6.1-6.1.2, 6.3-6.4.3. of Mäkinen et al book [MBCT].
- Mon 25.1. 14-16 B222: Exercise session 1: [problems] [solutions]
- Thu 28.1. 10-12 B119: Exercise session 2: [problems] [solutions]
- Mon 1.2. 12-14 B119: Study group: LCS, sparse dynamic programming, affine gap model, gene alignment. Reading: Sections 6.2, 6.4.4, 6.4.5, 6.5 of [MBCT]
- Thu 4.2. 10-12 B119: Exercise session 3: [problems] [solutions]
-
Mon 8.2. 12-14 B119: Study group:
Hidden Markov Models, Viterbi algorithm, Forward and Backward algorithms, HMM parameter estimation
Reading: Chapter 7 of [MBCT] - Thu 11.2. 10-12 B119: Exercise session 4: [problems] [solutions]
- Mon 15.2. 12-14 B119: Study group: Gene alignment, Multiple alignment: scoring schemes, progressive alignment, DAG alignment, jumping alignment. Reading: Sections 6.5 & 6.6 of [MBCT]
- Thu 18.2. 10-12 B119: Lecture: profile HMMs and index structures (suffix trees and suffix arrays)
- Mon 22.2. 12-14 B119: Study group: Classical indexes: k-mer index, suffix array, suffix tree, applications. Reading: Sections 8, 9.1-2 of [MBCT]
- Mon 22.2. 14-16 B222: Exercise session 5: [problems] [solutions]
- Thu 25.2. 10-12 B119: Exercise session 6: [problems] [solutions]
-
Mon 29.2. 12-14 B119: Study group:
Read alignment: pattern partitioning, dynamic programming along suffix tree paths;
comparing genomes without an alignment; Genomics: variation calling
Reading: Chapter 10, pages 201-206; Chapter 11, pages 220-221 & 229-233; Chapter 14, pages 307-312 of [MBCT] -
Thu 3.3. 10-12 B119: Exercise session 7: [problems]
Kirjallisuus ja materiaali
- [MBCT] V. Mäkinen, D. Belazzougui, F. Cunial, and A. Tomescu. Genome-Scale Algorithm Design: Biological sequence analysis in the era of high-throughput sequencing. Cambridge University Press, 2015 [access electronic version].
- R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic models of proteins and nucleid acids. Cambridge University Press, 1998 [access electronic version].
- R. C. Deonier, M. S. Waterman, S. Tavaré. Computational Genome Analysis: An Introduction. Springer, 2005 [access electronic version].
Note: electronic versions require access from the university network.