Data Structures in Genome Analysis (guided self study)
Year | Semester | Date | Period | Language | In charge |
---|---|---|---|---|---|
2016 | spring | 14.03-02.05. | 4-4 | English | Veli Mäkinen |
Lectures
Time | Room | Lecturer | Date |
---|---|---|---|
Mon 12-14 | B119 | Veli Mäkinen | 14.03.2016-02.05.2016 |
Exercise groups
Time | Room | Instructor | Date | Observe |
---|---|---|---|---|
Thu 10-12 | B119 | Veli Mäkinen | 14.03.2016—06.05.2016 |
Ilmoittautuminen tälle kurssille alkaa tiistaina 16.2. klo 9.00.
Registration for this course starts on Tuesday 16th of February at 9.00.
General
Week I:
- Monday 14.3: Overview. Rank structure and wavelet tree. (Veli Mäkinen)
-
Thursday17.3: Study group on core data structures, Chapter 3 (Veli Mäkinen)
- See www.genome-scale.info/errata.html for few mistakes on this chapter (please report more)
Week II:
-
Monday 21.3: Exercise 1 (Jarno Alanko): 3.1, 3.10, 3.12, 3.9, 3.6 (from course book pages 27-29)
- Hint for question 3.6. General Update-operation will be difficult (impossible?) with this solution, but one can update a value to a smaller one. Consider storing to vEB tree only keys whose value may be an answer to some semi-infinite range mininimum query. You can assume the vEB tree supports also successor operation (inverse of predecessor) to make things easier.
- Easter break 24.3-30.3
- Thursday 31.3: Study group on k-mer indexing and suffix arrays, Sections 8-8.2 (Veli Mäkinen)
Week III:
- Monday 4.4: Exercise 2 (Jarno Alanko): 8.2, 8.3, 8.5, 8.10, 8.11
-
Wednesday 6.4 12.30-14 B222: Study group on suffix trees, rest of Chapter 8 (Veli Mäkinen)
- Skip Lemma 8.17 (whose proof has a flaw), and study instead the solution linked at www.genome-scale.info/errata.html
- Suffix-prefix overlaps: black board 1, black board 2
Week IV:
-
Monday 11.4: Exercise 3 (Jarno Alanko): 8.7, 8.15, 8.16, 8.20 (typo in I[...], should be I[1..d]), 8.19 OR 5b simulation below
- 5b: Simulate suffix-prefix overlap computation (Sect. 8.4.4) on some simple but illustrative input; see the images from Wednesday study group above.
-
Thursday 17.4: Study group on Burrows-Wheeler indexing, Sections 9-9.4 (Veli Mäkinen)
- See the slide set from Monday 14.3 lecture for some animated examples.
- See the presentation of bidirectional BWT index.
Week V:
- Monday 18.4: Exercise 4 (Jarno Alanko): 9.2, 9.3, 9.5, 9.6, 9.7
- Thursday 21.4: Study group on de Bruijn graphs and overlap computation, Section 9.7 and Section 13.2.3 (Veli Mäkinen)
Week VI:
- Monday 25.4: Exercise 5 (Jarno Alanko): 9.18, 9.19, 9.21, 13.11, 13.10
- Thursday 28.4: Study group on maximal unique matches etc., Sections 11-11.1 (Veli Mäkinen)
Week VII:
- Monday 2.5: Study group on string kernels etc., Sections 11.2.1 and 11.2.3 (Veli Mäkinen): black board on telescoping technique
- Thursday 5.5: Helatorstai (holiday)
-
Exercise 6: Return by email to Jarno Alanko by 6.5. Solve any 5 of the following: 11.1, 11.2, 11.6, 11.10, 11.11, 11.13, 11.7, 11.17, 11.21, 11.20, 11.22
- Extra 1-2 points from solutions to 13.10
Completing the course
There is no course exam. Grading is based on the weekly exercises. There are five assignments per week. One should mark one assignment each week to pass. 12 points gives grade 1, 15 points gives grade 2, 18 points gives grade 3, 21 points gives grade 4, and 24 points gives grade 5. (Limits updated as we had no solution to Exercise 13.10; 29 is new maximum)
Literature and material
Veli Mäkinen, Djamal Belazzougui, Fabio Cunial, and Alexandru I. Tomescu. Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing. Cambridge University Press. [Book webpage] [HY institutional e-book]