Biological Sequence Analysis

Perustiedot

Kurssikoodi: 582483

Opintopisteet: 5

Erikoistumislinja: Bioinformatiikka

Taso: Syventävät opinnot

Kuvaus:

The course covers selected high-throughput methods for the analysis of biological sequences, including advanced alignment methods, Hidden Markov Models, and next-generation sequencing data analysis methods. Prerequisities: Basics of bioinformatics and algorithms.

Koe

02.05.2011 16.00 A111

Vuosi	Lukukausi	Päivämäärä	Periodi	Kieli	Vastuuhenkilö
2011	kevät	14.03-27.04.	4-4	Englanti	Veli Mäkinen

Luennot

Aika	Huone	Luennoija	Päivämäärä
Ma 10-12	D122	Veli Mäkinen	14.03.2011-27.04.2011
Ke 10-12	D122	Veli Mäkinen	14.03.2011-27.04.2011

Harjoitusryhmät

Group: 1
Aika	Huone	Ohjaaja	Päivämäärä	Huomioitavaa
Ke 12-14	D122	Veli Mäkinen	16.03.2011—16.03.2011
Ke 12-14	D122	Veli Mäkinen	21.03.2011—29.04.2011

Note:

The lectures and exercise sessions are cancelled on 4th, 6th and 11th of April!

Yleistä

The course covers the basic probabilistic methods for modelling and analysis of biological sequences. Special focus will be on the feasibility issues related to the analysis of modern high-throughput sequencing data.

Course starts with Hidden Markov Model (HMM) techniques following loosely Chapters 3-5 of the book Durbin R., Eddy S., Krogh A. and Mitchinson G.: Biological sequence analysis, Cambridge University Press, 1998. The course then continues with techniques essential for efficient analysis of high-throughput sequencing (HTS) data, such as Burrows-Wheeler Transform (BWT) -based indexing for read alignment. The probabilistic framework is again applied to the further steps in HTS data analysis, such as variation calling, alternative splicing, gene expression, and transcription factor binding site studies. Course conludes with further applications of BWT indexing, such as computation of maximal exact matches and maximal unique matches, that are useful for comparative genomics studies.

Prerequisities: Knowledge on basic bioinformatics concepts such as alignment algorithms and score matrices (e.g. from Elements of Bioinformatics course).
Useful background: Knowledge of full-text indexing concepts (e.g. from String processing methods course).

Kurssin suorittaminen

Course consists of lectures and exercises, and of a compulsory project work. Exam gives maximum 40 points, weekly exercises 5 points, and project work 15 points. Grading is based on total points gathered. However, to pass the course, one must pass the exam with at least 20 points, and pass the project work.

Kirjallisuus ja materiaali

We follow the course script. The script is still preliminary and will be updated during the course (especially the latter chapters).

Exercise solutions may appear here (based on voluntary activity).

Schedule (tentative):

14.3. Alignments revisited, sparse dynamic programming, invariant technique and its application to affine gap score computation:
Course script Sections 3.1 and 3.4 quickly revisited, 3.2 and 3.4.5 in detail
16.3. PAM matrices, Hidden Markov Models (HMM):
Course script Sections 3.4.6, 4-4.6
16.3. Exercise 1:
This will be study group, we will solve selected exercises at the end of the course script Chapter 3 on the fly.
21.3. Baum-Welch, More complex HMMs, pair HMMs, suboptimal alignments, sampling:
Course script Sections 4.7-4.10
23.3. Profile HMMs and their application to multiple alignment, jumping alignments:
Course script Sections 4.11-4.12
23.3. Exercise 2:
Assignments 5, 8, 13 from end of Chapter 3 and 1, 2, 3 from end of Chapter 4 in course script
28.3. High-throughput sequencing (HTS) overview:
Course script Section 7-7.1.2 (note: a mistake corrected from read filtering part)
30.3. HTS analysis fundamentals: Burrows-Wheeler transform, FM-index, search space pruning:
Lecture slides & Course script Sections 5.6., 6.2.2, 6.3.1, 6.4.1, 7.1.3
30.3. Exercise 3:
Assignments 4,5,7,9,10,11 from end of Chapter 4 in course script
4.4. No lecture (overlapping course in Viikki)
6.4. No lecture (overlapping course in Viikki)
11.4. No lecture (overlapping course in Viikki)
13.4. RNA-sequencing, splice variants:
Course script Sections 7.1.4-7.1.5, 7.3.3
13.4. Exercise 4:
Assignments 1-6 from the end of Chapter 7 in course script
(Note: BWA pruning was incorrectly defined in the script (now corrected) and the correspondin assignment changed accordingly. An exercise added to the end of the script to show that now it works correctly (with a solution).)
18.4. Variant calling, Peak detection, Co-linear chaining,
Course script Sections 7.3.1, 7.3.2, 7.3.4
20.4. Whole genome alignment, MEMs and MUMs and related algorithms on suffix tree and on FM-index variants
Course script sections 7.2.1. and 7.3.5 and slides on MUMs
20.4. Exercise 5:
Assignments 7-9,11,13 from the end of Chapter 7 in course script
Data for the project work: Genome resequencing competition (team work recommended):
- You are given reference genome A and simulated short read data D from genome B. Genome B is variant of A with some simulated SNPs, and short and large indels.
- The task is to reconstruct B' from A and D such that B' is as close to B as possible.
- Winner is the one obtaining best whole genome alignment score between B' and B.
- Download project data from here
Course exam 2.5: You can choose among two alternatives a) and b):
a) There will be 4 questions on the topics covered during lectures / exercises -> 4 cr with the passed project work.
b) There will be 5th question on parts of lecture script not covered during lectures -> 4 cr without the project work if getting half of the maximum points for 5th question. Project work can then be done separately, giving extra 2 cr.
2.5.-19.5. Guidance for the project work (B218, pop in, or contact by email)
20.5. Genome resequencing competition deadline
Renewal exam / Separate exams: The first separate exam will be renewal exam and you can choose options a) and b) as in course exam 2.5. (see above).
Later separate exams will examine the whole course script and give 4 cr without the project work. Project work can then be done separately on request.

Osoite: Tietojenkäsittelytieteen laitos, PL 68 (Gustaf Hällströmin katu 2b), 00014 Helsingin yliopisto
Aukioloajat: Normaalisti syys- ja kevätlukukausien aikana ma - pe klo 7.45-19.45.
Puhelin: 0294 1911 (yliopiston vaihde)
Sähköposti: Palveluosoitteet
Faksi: 09 876 4314

Kirjaudu sivulle | Webmaster

Department of Computer Science [pre 2018 site]

Helsingin Yliopisto

Matemaattis-luonnontieteellinen tiedekunta