University homepage Suomenkielinen versio puuttuu Inte på svenska In english
University of Helsinki Department of Computer Science
 

Department of Computer Science

Biological Sequence Analysis exercises

Copies of the Durbin's book can be found in the course folder in C127! Relevant parts of the book can also be found from google books.

Exercise session 1 (Thursday 6.11) :
  1. 1.2 from Durbin
  2. 1.4 from Durbin
  3. In the genome of an organism, nucleotides C, G, A, and T occur with frequencies 0.35, 0.35, 0.15, and 0.15, respectively. Assuming the independence model for the genome, what is the probability that a randomly selected 15 nucleotides long DNA fragment contains eight C's or G's and seven A's or T's?
  4. 2.1 from Durbin
  5. 2.4 from Durbin
  6. 2.5 from Durbin
Exercise session 2 (Thursday 13.11) :
  1. 2.8 from Durbin
  2. 2.9 from Durbin
  3. Explain how the PAM250 scoring matrix has been constructed (sect 2.8 from Durbin)
  4. Explain how the BLOSUM50 and BLOSUM62 scoring matrices have been constructed (sect 2.8 from Durbin
  5. Describe the linear space alignment algorithm (sect. 2.6 from Durbin)
  6. 3.1 from Durbin
Exercise session 3 (Thursday 20.11) :
  1. 3.2 and 3.3 from Durbin
  2. Analyze the sequence 6 1 6 of rolls using the HMM for the casino example (page 54 of Durbin).
    1. Find the Viterbi path for the sequence 6 1 6
    2. Evaluate the total probability of emitting 6 1 6 (use Forward algorithm)
    You may assume that the transition probabilities from the initial state to 'Fair' and 'Loaded' are = 0.5.
  3. 3.8 from Durbin
  4. 3.10 and 3.11 from Durbin
  5. Durbin et al. book, on page 78, describes a trick to add log transformed probability values fast and approximately correctly.
    1. Explain this trick
    2. How one can generalize the trick for more than two numbers to be added?
  6. Try the BLAST search engine (www.ncbi.nlm.nih.gov/BLAST/).
    1. search sequence GAATTCCAATAGA with blastn from the Yeast database (database "nr", and type "yeast");
    2. search the sequnces given in Fig. 2.1 of Durbin using PSI-BLAST from SWISSPROT database. Select values for the sensitivity parameters such that you really find something.
Exercise session 4 (Thursday 27.11) :
  1. 4.1 from Durbin (page 86). Full random model is given by the figure on page 83.
  2. 4.2 from Durbin (page 86).
  3. Describe the structure of a pair-HMM that corresponds to the gap model with *linear* gap penalties.
  4. One wants to list (global) pairwise alignments of two sequences in descending order of the score of the alignment, starting from the highest scoring alignment. How can you do this? Sketch an algorithm. NOTE: You are expected to list only K best alignments.
  5. Try the HMMER tool (http://hmmer.janelia.org). Produce a profile-HMM from the alignment given in Figure 5.3 of Durbin. Is the resulting HMM any good? (More guidance of using HMMER available http://www.cs.helsinki.fi/u/prastas/hmmer.html .)
  6. Estimate for the HMM in Fig 5.4. of Durbin the transition probabilities from state M3 to state I3 and from state M3 to state D4 using the alignment of Fig 5.3. of Durbin.
Exercise session 5 (Thursday 4.12) :
  1. What is the value given to S1 when the MAP model construction algorithm is applied on the alignment of Fig. 5.7 of Durbin?
  2. Exercise 6.1 from Durbin (page 142). Additional question: How many more sequences could be aligned with a million times faster computer for which pairwise comparison takes 1/1 000 0000 seconds?
  3. Explain how the MSA algorithm works (Durbin pp. 142-143)
  4. Explain how simulated annealing can be used with the BW-algorithm (Durbin pp. 155-156)
  5. Try CLUSTAL program for constructing multiple alignments (more guidance www.cs.helsinki.fi/u/prastas/clustal.html)
  6. T-Coffee is a recent multiple alignment software. Explain the main principles of T-Coffee. How does it differ from CLUSTAL? (Hint: Find answers from the WWW). Try T-Coffee (the server is here: http://www.tcoffee.org/)
Extra exercise session 6 (Monday 8.12 at 14-16 in C221, no exercise points) :
  1. -5. pdf (see chapter 6 from Paul H. Dear (ed.): Bioinformatics (Methods Express), available online and in the course folder)
  2. Give course feedback in English or in Finnish!
Pasi Rastas
Last modified: Mon Dec 8 09:54:51 EET 2008