Biological Sequence Analysis
5
Bioinformatics
Advanced studies
The course covers selected highthroughput methods for the analysis of biological sequences, including advanced alignment methods, Hidden Markov Models, and nextgeneration sequencing data analysis methods. Prerequisities: Basics of bioinformatics and algorithms.
Exam
27.02.2014
16.00
A111
Year  Semester  Date  Period  Language  In charge 

2014  spring  13.0120.02.  33  English  Esko Ukkonen 
Lectures
Time  Room  Lecturer  Date 

Mon 1214  B222  Esko Ukkonen  13.01.201420.02.2014 
Thu 1012  B222  Esko Ukkonen  13.01.201420.02.2014 
Exercise groups
Time  Room  Instructor  Date  Observe 

Mon 1012  B222  Djamal Belazzougui  13.01.2014—21.02.2014  
Mon 1012  B222  Djamal Belazzougui  24.02.2014—24.02.2014 
General
 Content

Exercise session 1 (Monday 20 Jan): problems 1.1, 1.2, 1.4, 2.1, 2.4 from the text book.

Exercise session 2, (Monday 27 Jan):
 Problem 2.5 from the textbook.
 Problem 2.8 from the textbook.
 Problem 2.9 from the textbook.
 Explain how the PAM250 scoring matrix has been constructed?
 Explain how the BLOSUM50 and BLOSUM62 matrixes have been constructed?

Exercise session 3, (Monday 3rd Feb):
 Problem 3.1 from the textbook.
 Problems 3.2 and 3.3 from the textbook
 Simulate the Viterbi algorithm for the Dishonest Casino HMM (page 55 of the textbook) when
the emission sequence is 1 2 6 6. What is the Viterbi path in this case?  Problem 3.5 from the textbook
 Try the BLAST search engine.
 Search using blastn the DNA sequence GAATTCCAATAGA from Mouse G+T database.
What are the best hits you find?  Search using blastp and PSIBLAST the amino acid sequences given in Fig. 2.1 of the textbook.
Find search parameters such that you get a nonempty but not very large set of hits.
Do you find any interesting hits?
 Search using blastn the DNA sequence GAATTCCAATAGA from Mouse G+T database.

Exercise session 4, (Monday 10th Feb):
 Analyze the sequence 6 1 6 of rolls using the HMM for the casino example (page 55 of the textbook).
You may assume that the transition probabilities from the initial state to 'Fair' and 'Loaded' are = 0.5.
Fix also the other missing details such that you can simulate the algorithms. Evaluate with the Forward algorithm the total probability of emitting 6 1 6
 Evaluate with the Backward algorithm the total probability of emitting 6 1 6
 For the HMM of the casino example (page 55 of the textbook), what is for an emission sequence 6 1 6
the probability that symbol 1 was emitted from state 'fair'? You may assume that the transition
probabilities from the initial state to 'Fair' and 'Loaded' are = 0.5. Fix also the other missing details such
that you can simulate the algorithms.  Problem 3.8 from the textbook.
 Problems 3.10 and 3.11 from the textbook.
 The textbook describes on pages 7879 a trick how to add log transformed probability values
fast and approximately correctly. Explain this trick
 How one can generalize the trick for more than two numbers to be added?

Exercise session 5, (Monday 17th Feb):
 Try the HMMER tool. Produce a profileHMM from the alignment given in Figure 5.3 of the textbook.
Is the resulting HMM any good? (More guidance of using HMMER availabe).  Estimate for the HMM in Fig 5.4. of the textbook the transition probabilities from state M3 to state I3
and from state M3 to state D4 using the alignment of Fig 5.3. of the textbook.  What is the value given to S1 when the MAP model construction algorithm is applied on the alignment
of Fig. 5.7 of the textbook?  Exercise 6.1 from the textbook (page 143). Additional question: How many more sequences could be
aligned with a million times faster computer for which pairwise comparison takes 1/1 000 0000 seconds?  Explain how the MSA algorithm works (the textbook pp. 143144).
 Explain how simulated annealing can be used with the BWalgorithm (the textbook pp. 155157).

Exercise session 6, (Monday 24th Feb):
 Problems 1  5.
 Problem 6: TCoffee is a multiple alignment software. Explain the main principles of TCoffee.
How does it differ from CLUSTAL? (Hint: Find answers from the WWW). Try TCoffee, the server is here.
Completing the course
Grading :
 60 points maximum :
 15 points from the exercises (50 % activity is minimum requirement to pass,
about 80% activity gives full 15 points). All six sessions will count.  45 points from the Exam.
 15 points from the exercises (50 % activity is minimum requirement to pass,
Exam :
Course Exam will be on Thursday 27 Feb at 16:00, room CK112.
The exam problems will be similar to what you had in the exercises: Descriptions of concepts and
algorithms, simulations of algorithms. The exam will be based on the material that is
available for you via the course web page.
Grades :
The grades are now available.