Biological Sequence Analysis
5
Bioinformatiikka
Syventävät opinnot
The course covers selected high-throughput methods for the analysis of biological sequences, including advanced alignment methods, Hidden Markov Models, and next-generation sequencing data analysis methods. Prerequisities: Basics of bioinformatics and algorithms.
Koe
27.02.2014
16.00
A111
Vuosi | Lukukausi | Päivämäärä | Periodi | Kieli | Vastuuhenkilö |
---|---|---|---|---|---|
2014 | kevät | 13.01-20.02. | 3-3 | Englanti | Esko Ukkonen |
Luennot
Aika | Huone | Luennoija | Päivämäärä |
---|---|---|---|
Ma 12-14 | B222 | Esko Ukkonen | 13.01.2014-20.02.2014 |
To 10-12 | B222 | Esko Ukkonen | 13.01.2014-20.02.2014 |
Harjoitusryhmät
Aika | Huone | Ohjaaja | Päivämäärä | Huomioitavaa |
---|---|---|---|---|
Ma 10-12 | B222 | Djamal Belazzougui | 13.01.2014—21.02.2014 | |
Ma 10-12 | B222 | Djamal Belazzougui | 24.02.2014—24.02.2014 |
Yleistä
- Content
-
Exercise session 1 (Monday 20 Jan): problems 1.1, 1.2, 1.4, 2.1, 2.4 from the text book.
-
Exercise session 2, (Monday 27 Jan):
- Problem 2.5 from the textbook.
- Problem 2.8 from the textbook.
- Problem 2.9 from the textbook.
- Explain how the PAM250 scoring matrix has been constructed?
- Explain how the BLOSUM50 and BLOSUM62 matrixes have been constructed?
-
Exercise session 3, (Monday 3rd Feb):
- Problem 3.1 from the textbook.
- Problems 3.2 and 3.3 from the textbook
- Simulate the Viterbi algorithm for the Dishonest Casino HMM (page 55 of the textbook) when
the emission sequence is 1 2 6 6. What is the Viterbi path in this case? - Problem 3.5 from the textbook
- Try the BLAST search engine.
- Search using blastn the DNA sequence GAATTCCAATAGA from Mouse G+T database.
What are the best hits you find? - Search using blastp and PSI-BLAST the amino acid sequences given in Fig. 2.1 of the textbook.
Find search parameters such that you get a non-empty but not very large set of hits.
Do you find any interesting hits?
- Search using blastn the DNA sequence GAATTCCAATAGA from Mouse G+T database.
-
Exercise session 4, (Monday 10th Feb):
- Analyze the sequence 6 1 6 of rolls using the HMM for the casino example (page 55 of the textbook).
You may assume that the transition probabilities from the initial state to 'Fair' and 'Loaded' are = 0.5.
Fix also the other missing details such that you can simulate the algorithms.- Evaluate with the Forward algorithm the total probability of emitting 6 1 6
- Evaluate with the Backward algorithm the total probability of emitting 6 1 6
- For the HMM of the casino example (page 55 of the textbook), what is for an emission sequence 6 1 6
the probability that symbol 1 was emitted from state 'fair'? You may assume that the transition
probabilities from the initial state to 'Fair' and 'Loaded' are = 0.5. Fix also the other missing details such
that you can simulate the algorithms. - Problem 3.8 from the textbook.
- Problems 3.10 and 3.11 from the textbook.
- The textbook describes on pages 78-79 a trick how to add log transformed probability values
fast and approximately correctly.- Explain this trick
- How one can generalize the trick for more than two numbers to be added?
-
Exercise session 5, (Monday 17th Feb):
- Try the HMMER tool. Produce a profile-HMM from the alignment given in Figure 5.3 of the textbook.
Is the resulting HMM any good? (More guidance of using HMMER availabe). - Estimate for the HMM in Fig 5.4. of the textbook the transition probabilities from state M3 to state I3
and from state M3 to state D4 using the alignment of Fig 5.3. of the textbook. - What is the value given to S1 when the MAP model construction algorithm is applied on the alignment
of Fig. 5.7 of the textbook? - Exercise 6.1 from the textbook (page 143). Additional question: How many more sequences could be
aligned with a million times faster computer for which pairwise comparison takes 1/1 000 0000 seconds? - Explain how the MSA algorithm works (the textbook pp. 143-144).
- Explain how simulated annealing can be used with the BW-algorithm (the textbook pp. 155-157).
-
Exercise session 6, (Monday 24th Feb):
- Problems 1 - 5.
- Problem 6: T-Coffee is a multiple alignment software. Explain the main principles of T-Coffee.
How does it differ from CLUSTAL? (Hint: Find answers from the WWW). Try T-Coffee, the server is here.
Kurssin suorittaminen
Grading :
- 60 points maximum :
- 15 points from the exercises (50 % activity is minimum requirement to pass,
about 80% activity gives full 15 points). All six sessions will count. - 45 points from the Exam.
- 15 points from the exercises (50 % activity is minimum requirement to pass,
Exam :
Course Exam will be on Thursday 27 Feb at 16:00, room CK112.
The exam problems will be similar to what you had in the exercises: Descriptions of concepts and
algorithms, simulations of algorithms. The exam will be based on the material that is
available for you via the course web page.
Grades :
The grades are now available.