Biological Sequence Analysis

Basic information

Course code: 582483

Credit units: 5

Subprogramme: Bioinformatics

Level: Advanced studies

Description:

The course covers selected high-throughput methods for the analysis of biological sequences, including advanced alignment methods, Hidden Markov Models, and next-generation sequencing data analysis methods. Prerequisities: Basics of bioinformatics and algorithms.

Exam

02.05.2011 16.00 A111

Year	Semester	Date	Period	Language	In charge
2011	spring	14.03-27.04.	4-4	English	Veli Mäkinen

Lectures

Time	Room	Lecturer	Date
Mon 10-12	D122	Veli Mäkinen	14.03.2011-27.04.2011
Wed 10-12	D122	Veli Mäkinen	14.03.2011-27.04.2011

Exercise groups

Group: 1
Time	Room	Instructor	Date	Observe
Wed 12-14	D122	Veli Mäkinen	16.03.2011—16.03.2011
Wed 12-14	D122	Veli Mäkinen	21.03.2011—29.04.2011

Note:

The lectures and exercise sessions are cancelled on 4th, 6th and 11th of April!

General

The course covers the basic probabilistic methods for modelling and analysis of biological sequences. Special focus will be on the feasibility issues related to the analysis of modern high-throughput sequencing data.

Course starts with Hidden Markov Model (HMM) techniques following loosely Chapters 3-5 of the book Durbin R., Eddy S., Krogh A. and Mitchinson G.: Biological sequence analysis, Cambridge University Press, 1998. The course then continues with techniques essential for efficient analysis of high-throughput sequencing (HTS) data, such as Burrows-Wheeler Transform (BWT) -based indexing for read alignment. The probabilistic framework is again applied to the further steps in HTS data analysis, such as variation calling, alternative splicing, gene expression, and transcription factor binding site studies. Course conludes with further applications of BWT indexing, such as computation of maximal exact matches and maximal unique matches, that are useful for comparative genomics studies.

Prerequisities: Knowledge on basic bioinformatics concepts such as alignment algorithms and score matrices (e.g. from Elements of Bioinformatics course).
Useful background: Knowledge of full-text indexing concepts (e.g. from String processing methods course).

Completing the course

Course consists of lectures and exercises, and of a compulsory project work. Exam gives maximum 40 points, weekly exercises 5 points, and project work 15 points. Grading is based on total points gathered. However, to pass the course, one must pass the exam with at least 20 points, and pass the project work.

Literature and material

We follow the course script. The script is still preliminary and will be updated during the course (especially the latter chapters).

Exercise solutions may appear here (based on voluntary activity).

Schedule (tentative):

14.3. Alignments revisited, sparse dynamic programming, invariant technique and its application to affine gap score computation:
Course script Sections 3.1 and 3.4 quickly revisited, 3.2 and 3.4.5 in detail
16.3. PAM matrices, Hidden Markov Models (HMM):
Course script Sections 3.4.6, 4-4.6
16.3. Exercise 1:
This will be study group, we will solve selected exercises at the end of the course script Chapter 3 on the fly.
21.3. Baum-Welch, More complex HMMs, pair HMMs, suboptimal alignments, sampling:
Course script Sections 4.7-4.10
23.3. Profile HMMs and their application to multiple alignment, jumping alignments:
Course script Sections 4.11-4.12
23.3. Exercise 2:
Assignments 5, 8, 13 from end of Chapter 3 and 1, 2, 3 from end of Chapter 4 in course script
28.3. High-throughput sequencing (HTS) overview:
Course script Section 7-7.1.2 (note: a mistake corrected from read filtering part)
30.3. HTS analysis fundamentals: Burrows-Wheeler transform, FM-index, search space pruning:
Lecture slides & Course script Sections 5.6., 6.2.2, 6.3.1, 6.4.1, 7.1.3
30.3. Exercise 3:
Assignments 4,5,7,9,10,11 from end of Chapter 4 in course script
4.4. No lecture (overlapping course in Viikki)
6.4. No lecture (overlapping course in Viikki)
11.4. No lecture (overlapping course in Viikki)
13.4. RNA-sequencing, splice variants:
Course script Sections 7.1.4-7.1.5, 7.3.3
13.4. Exercise 4:
Assignments 1-6 from the end of Chapter 7 in course script
(Note: BWA pruning was incorrectly defined in the script (now corrected) and the correspondin assignment changed accordingly. An exercise added to the end of the script to show that now it works correctly (with a solution).)
18.4. Variant calling, Peak detection, Co-linear chaining,
Course script Sections 7.3.1, 7.3.2, 7.3.4
20.4. Whole genome alignment, MEMs and MUMs and related algorithms on suffix tree and on FM-index variants
Course script sections 7.2.1. and 7.3.5 and slides on MUMs
20.4. Exercise 5:
Assignments 7-9,11,13 from the end of Chapter 7 in course script
Data for the project work: Genome resequencing competition (team work recommended):
- You are given reference genome A and simulated short read data D from genome B. Genome B is variant of A with some simulated SNPs, and short and large indels.
- The task is to reconstruct B' from A and D such that B' is as close to B as possible.
- Winner is the one obtaining best whole genome alignment score between B' and B.
- Download project data from here
Course exam 2.5: You can choose among two alternatives a) and b):
a) There will be 4 questions on the topics covered during lectures / exercises -> 4 cr with the passed project work.
b) There will be 5th question on parts of lecture script not covered during lectures -> 4 cr without the project work if getting half of the maximum points for 5th question. Project work can then be done separately, giving extra 2 cr.
2.5.-19.5. Guidance for the project work (B218, pop in, or contact by email)
20.5. Genome resequencing competition deadline
Renewal exam / Separate exams: The first separate exam will be renewal exam and you can choose options a) and b) as in course exam 2.5. (see above).
Later separate exams will examine the whole course script and give 4 cr without the project work. Project work can then be done separately on request.

Address: Department of Computer Science, P.O. 68 (Gustaf Hällströmin katu 2b), FI-00014 UNIVERSITY OF HELSINKI, FINLAND
Opening Hours: During spring and autumn semesters Mon - Fri 7.45 - 19.45 (7.45 am - 7.45 pm)
Phone: +358 9 1911 (University switch)
General e-mail: info [at] cs.helsinki.fi
Fax: +358 9 876 4314

Department of Computer Science [pre 2018 site]

University of Helsinki

Faculty of Science