Biological Sequence Analysis (guided self study)

Basic information

Course code: 582483

Credit units: 5

Subprogramme: Bioinformatics

Level: Advanced studies

Description:

The course covers selected high-throughput methods for the analysis of biological sequences, including advanced alignment methods, Hidden Markov Models, and next-generation sequencing data analysis methods. Prerequisities: Basics of bioinformatics and algorithms.

Year	Semester	Date	Period	Language	In charge
2015	spring	12.01-26.02.	3-3	English	Veli Mäkinen

Lectures

Time	Room	Lecturer	Date
Mon 12-14	B222	Veli Mäkinen	12.01.2015-23.02.2015

Exercise groups

Group: 1
Time	Room	Instructor	Date	Observe
Thu 10-12	B222	Veli Mäkinen	15.01.2015—26.02.2015

General

The course covers selected high-throughput methods for the analysis of biological sequences. Topics include advanced alignment methods, algorithms around hidden Markov models, and core data structures for read alignment and genome analysis. This edition of the course is guided self study, meaning that more home study is expected as there is one less lecture slot than normally. At study groups we discuss the week's topic (the exact form of study group work depends on the number of participants). Exercises test the knowledge of the study group material and their extensions to related topics. There will be some tailored assignments depending on the student's background: choice between deeper theory assignments for mathematically oriented and more labour-some implementation assignments for those who prefer learning by doing.

Completing the course

There is no course exam. The grading is based on the activity during the course. Monday study groups are mandatory (you should attend at least 4 out of 6). Exercises determine the grade: 50% gives 1, 85% gives 5. Solutions can be returned by email.

Content

Mon 12.1 12-14. Introductory lecture: Biology primer, Markov chains, alignments, score schemes, log-odds, BLOSUM, GC-content, GC-skew, CAI. Sections 1-1.2 + [slides.ppt] [slides.pdf]
Thu 15.1 10 -12. Exercise 1 [pdf] [solution ex6.py]
Mon 19.1 12-14. Study group: Dynamic programming for various alignment models + shortest detour. Sections 6.1-6.1.2, 6.3-6.4.3.
Thu 22.1 10-12. Exercise 2 [pdf] [solutions]
Mon 26.1 12-14. Study group: Invariant technique, sparse dynamic programming, affine gap model. Sections 6.2 [..Ha*], 6.4.4 [He*..M*], 6.4.5 [N*..]
Thu 29.1 10-12. Exercise 3 [pdf] [solutions]
Mon 2.2 12-14. Study group, Hidden Markov Models, forward, backward, Baum-Welch. Chapter 7
Thu 5.2 10-12. Exercise 4 [pdf]
Mon 9.2 12-14. Study group, Multiple alignments, jumping alignments, Section 6.6
Thu 12.2 10-12. Exercise 5 [pdf] [solutions]
Mon 16.2 12-14. Study group: High-throughput sequencing (HTS) overview, variant calling, Burrows-Wheeler transform and indexes, search space pruning. Sections 1.3, 9.1-9.4.1, 10-10.5, 14.1.1 [slides.pptx] [slides.pdf] (enough to focus on conceptual ideas; data structure compression techniques are covered in a simultaneous data compression techniques course)
Thu 19.2 10-12. Exercise 6 [pdf]
Mon 23.2 12-14. Study group: Genome analysis, maximal repeats, unique and exact matches on suffix tree and on bidirectional BWT index. Sections 8.4, 11.1
Thu 26.2 10-12. Exercise 7 [pdf]
An alternative way to take the course is by separate exam: http://www.cs.helsinki.fi/exams
Take the variation calling challenge project at period IV to learn practical skills related to the topic of the course.
Transcriptomics and other "upstream" analysis building on top of underlying sequence analysis are considered in Algorithms in Molecular Biology, period IV.

Literature and material

The course is based on selected chapters from the book:

Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru Tomescu. Genome-Scale Algorithm Design: Biological sequence analysis in the era of high-throughput sequencing. Cambridge University Press, in press.

More in-depth probabilistic modeling of alignments and hidden Markov models can be found from the book:

R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic models of proteins and nucleid acids. Cambridge University Press, 1998.

First lecture is largely based on the book:

R. C. Deonier, S. Tavaré, and M. S. Waterman. Computational Genome Analysis: An Introduction. Springer, 2005.

Address: Department of Computer Science, P.O. 68 (Gustaf Hällströmin katu 2b), FI-00014 UNIVERSITY OF HELSINKI, FINLAND
Opening Hours: During spring and autumn semesters Mon - Fri 7.45 - 19.45 (7.45 am - 7.45 pm)
Phone: +358 9 1911 (University switch)
General e-mail: info [at] cs.helsinki.fi
Fax: +358 9 876 4314

Department of Computer Science [pre 2018 site]

University of Helsinki

Faculty of Science