Biological Sequence Analysis

582483
5
Bioinformatics
Advanced studies
The course covers selected high-throughput methods for the analysis of biological sequences, including advanced alignment methods, Hidden Markov Models, and next-generation sequencing data analysis methods. Prerequisities: Basics of bioinformatics and algorithms.

Exam

27.02.2014 16.00 A111
Year Semester Date Period Language In charge
2014 spring 13.01-20.02. 3-3 English Esko Ukkonen

Lectures

Time Room Lecturer Date
Mon 12-14 B222 Esko Ukkonen 13.01.2014-20.02.2014
Thu 10-12 B222 Esko Ukkonen 13.01.2014-20.02.2014

Exercise groups

Group: 1
Time Room Instructor Date Observe
Mon 10-12 B222 Djamal Belazzougui 13.01.2014—21.02.2014
Mon 10-12 B222 Djamal Belazzougui 24.02.2014—24.02.2014

General

  1. Content
  • Exercise session 1 (Monday 20 Jan):  problems 1.1, 1.2, 1.4, 2.1, 2.4 from the text book. 
  • Exercise session 2, (Monday 27 Jan): 
  1. Problem 2.5 from the textbook.
  2. Problem 2.8 from the textbook.
  3. Problem 2.9 from the textbook.
  4. Explain how the PAM250 scoring matrix has been constructed?
  5. Explain how the BLOSUM50 and BLOSUM62 matrixes have been constructed?
  • Exercise session 3, (Monday 3rd Feb): 
  1. Problem 3.1 from the textbook.
  2. Problems 3.2 and 3.3 from the textbook
  3. Simulate the Viterbi algorithm for the Dishonest Casino HMM (page 55 of the textbook) when
    the emission sequence is 1 2 6 6. What is the Viterbi path in this case?
  4. Problem 3.5 from the textbook
  5. Try the BLAST search engine.
    1. Search using  blastn the DNA sequence GAATTCCAATAGA from Mouse G+T database.
      What are the best hits you find?
    2. Search using blastp and PSI-BLAST the amino acid sequences given in Fig. 2.1 of the textbook.
      Find search parameters such that you get a non-empty but not very large set of hits.
      Do you find any interesting hits?
  • Exercise session 4, (Monday 10th Feb): 
  1. Analyze the sequence 6 1 6 of rolls using the HMM for the casino example (page 55 of the textbook).
    You may assume that the transition probabilities from the initial state to 'Fair' and 'Loaded' are = 0.5.
    Fix also the other missing details such that you can simulate the algorithms.
    1. Evaluate with the Forward algorithm the total probability of emitting 6 1 6
    2. Evaluate with the Backward algorithm the total probability of emitting 6 1 6
  2. For the HMM of the casino example (page 55 of the textbook), what is for an emission sequence 6 1 6
    the probability that symbol 1 was emitted from state 'fair'?  You may assume that the transition
    probabilities from the initial state to 'Fair' and 'Loaded' are = 0.5. Fix also the other missing details such
    that you can simulate the algorithms.
  3. Problem 3.8 from the textbook.
  4. Problems 3.10 and 3.11 from the textbook.
  5. The textbook  describes on pages 78-79 a trick how to add log transformed probability values
    fast and approximately correctly.
    1. Explain this trick
    2. How one can generalize the trick for more than two numbers to be added?
  • Exercise session 5, (Monday 17th Feb): 
  1. Try the HMMER tool. Produce a profile-HMM from the alignment given in Figure 5.3 of the textbook.
    Is the resulting HMM any good? (More guidance of using HMMER availabe).
  2. Estimate for the HMM in Fig 5.4. of the textbook the transition probabilities from state M3 to state I3
    and from state M3 to state D4 using the alignment of Fig 5.3. of the textbook.
  3. What is the value given to S1 when the MAP model construction algorithm is applied on the alignment
    of Fig. 5.7 of the textbook?
  4. Exercise 6.1 from the textbook (page 143). Additional question: How many more sequences could be
    aligned with a million times faster computer for which pairwise comparison takes 1/1 000 0000 seconds?
  5. Explain how the MSA algorithm works (the textbook pp. 143-144).
  6. Explain how simulated annealing can be used with the BW-algorithm (the textbook pp. 155-157).
  • Exercise session 6, (Monday 24th Feb): 
  • Problems 1 - 5.
  • Problem 6: T-Coffee is a multiple alignment software. Explain the main principles of T-Coffee.
    How does it differ from CLUSTAL? (Hint: Find answers from the WWW). Try T-Coffee, the server is here.

 

 

Completing the course

Grading :

  • 60 points maximum :
    • 15 points from the exercises (50 % activity is minimum requirement to pass,
      about 80% activity gives full 15 points). All six sessions will count.
    • 45 points from the Exam.

Exam :

       Course Exam will be on Thursday 27 Feb at 16:00, room CK112.
        The exam problems will be similar to what you had in the exercises: Descriptions of concepts and
        algorithms, simulations of algorithms. The exam will be based on the material that is
        available for you via the course web page.

Grades :

The grades are now available.