Modelling and Analysis in Bioinformatics
Vuosi | Lukukausi | Päivämäärä | Periodi | Kieli | Vastuuhenkilö |
---|---|---|---|---|---|
2016 | syksy | 06.09-20.10. | 1-1 | Englanti | Antti Honkela |
Luennot
Aika | Huone | Luennoija | Päivämäärä |
---|---|---|---|
Ti 12-14 | B222 | Antti Honkela | 06.09.2016-20.10.2016 |
To 10-12 | B222 | Antti Honkela | 06.09.2016-20.10.2016 |
Harjoitusryhmät
Aika | Huone | Ohjaaja | Päivämäärä | Huomioitavaa |
---|---|---|---|---|
To 12-14 | B221 | 05.09.2016—21.10.2016 |
Yleistä
The course explores computational models for biological networks, including e.g. network motifs and gene regulation, and introduces probabilistic analysis of sequence-level problems in fragment assembly, pattern matching, and motif discovery. The course is lectured by Juha Kärkkäinen, Leena Salmela and Antti Honkela.
Basic information
Credit points (ECTS): 5
Prerequisites: basic programming skills (Python)
Kurssin suorittaminen
The course consists of lectures, study groups and programming exercises. Attendance in the study groups and visiting lectures is mandatory. In case you cannot attend a study group or a visiting lecture, contact the lecturers for an alternative assignment. Python language is used for the programming exercises.
Schedule
-
5.9.-9.9. Genomic k-mer statistics (Kärkkäinen)
- Tuesday 6.9. 12-14 Lecture [Slides]
-
Thursday 8.9. 10-12 Study groups
- If you do not know your group, please contact Juha Kärkkäinen
-
Group 1
- Brocchieri: The GC Content of Bacterial Genomes. J Phylogen Evolution Biol 2:e108.
-
Group 2
- Carbone, Zinovyev and Képès: Codon adaptation index as a measure of dominating codon bias. Bioinformatics (2003) 19 (16): 2005-2015.
-
Group 3
- Chor, Horn, Goldman, Levy and Massingham: Genomic DNA k-mer spectra: models and modalities. Genome Biology 2009 10:R108
- You do not need to understand or even read everything but try to understand the main points and the figures.
-
Thursday 8.9. 12-14 Exercise session
- Exercise sheet (sneak peak in HTML)
- Deadline: 15.9.
- More information on completing the exercise and Jupyter Notebook below
-
12.9.-16.9. Randomized motif finding (Kärkkäinen)
- Tuesday 13.9. 12-14 Lecture [Slides]
-
Thursday 15.9. 10-12 Study group
-
Group 1 (Students whose last name starts with A-H)
- Thijs et al.: A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics (2001) 17 (12): 1113-1122.
-
Group 2: (Students whose last name starts with K-N)
- Buhler and Tompa: Finding Motifs Using Random Projections. Journal of Computational Biology. July 2004, 9(2): 225-242.
- Concentrate on parameter choice and experiments.
-
Group 3: (Students whose last name starts with O-Z)
- Zia and Moses: Towards a theoretical understanding of false positives in DNA motif finding. BMC Bioinformatics 2012 13:151.
- You do not need to understand or even read everything but try to understand the main points.
-
Group 1 (Students whose last name starts with A-H)
-
Thursday 15.9. 12-14 Exercise session
- Exercise sheet (sneak peak in HTML)
- Deadline: 22.9.
-
19.9.-23.9. Global network models (Salmela)
- Tuesday 20.9. 12-14 Lecture [Slides]
-
Thursday 22.9. 10-12 Study group
-
Group 1 (Students whose first name starts with A-I)
- Jeong et al.: The large-scale organization of metabolic networks. Nature 407, 651-654, 2000.
-
Group 2 (Students whose first name starts with J-Z)
- Pržulj, Corneil and Jurisica: Modeling interactome: scale-free or geometric? Bioinformatics 20 (18): 3508-3515, 2004.
- You only need to read the part of the article discussing global network properties (Sections 1-2.1.2 and 3.3-4).
-
Group 1 (Students whose first name starts with A-I)
-
Thursday 22.9. 12-14 Exercise session
- Exercise sheet (sneak peek in HTML)
- Deadline: 29.9.
-
26.9.-30.9. Network motifs (Salmela)
- Tuesday 27.9. 12-14 Lecture [Slides]
-
Thursday 29.9. 10-12 Study group
-
Group 1 (Students whose first name has exactly 5 characters)
- F. Schreiber and H. Schwöbbermeyer: Frequency concepts and pattern detection for the analysis of motifs in networks. Trans. on Comput. Syst. Biol: III, pp. 89-104, 2005.
- Concentrate on Section 4.
-
Group 2 (Students whose first name does not have exactly 5 characters)
- N. Kashtan, S. Itzkovitz, R. Milo and U. Alon: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11):1746-1758, 2004.
- Concentrate on the Methods section.
-
Group 1 (Students whose first name has exactly 5 characters)
-
Thursday 29.9. 12-14 Exercise session
- Exercise sheet (Sneak peek in HTML)
- Deadline: 6.10.
-
3.10.-7.10. Networks in biological systems (Honkela)
- Tuesday 4.10. 12-14 Lecture [Slides, demo shown at the lecture]
-
Thursday 6.10. 10-12 Study group
-
Papers:
- Gillespie D. Exact stochastic simulation of coupled chemical reactions. Journal of physical chemistry 81(25):2340--2361, 1977.
- Gillespie D. The chemical Langevin equation. The Journal of chemical physics 113(1):297--306, 2000.
-
Tasks:
- Group 1 (Students with surname starting with A-H): Read Gillespie (1977), especially Secs. I, IIIB, IIIC
- Group 2 (Students with surname starting with I-L): Read Gillespie (2000), especially Secs. I, II, III
- Group 3 (Students with surname starting with M-Z): Read Gillespie (2000), especially Secs. I, II, IV
-
Papers:
-
Thursday 6.10. 12-14 Exercise session
- Exercise sheet (Sneak peek in HTML)
- Deadline 13.10.
-
10.10.-14.10. Network inference (Honkela)
- Tuesday 11.10. 12-14 Lecture [Slides, demo shown at the lecture]
-
Thursday 13.10. 10-12 Study group
-
O. Heinävaara, J. Leppä-aho, J. Corander and A. Honkela. On the inconsistency of l1-penalised sparse precision matrix estimation. arXiv:1603.02532 [cs.LG]
-
You do not need to understand all the details and descriptions of alternative approaches.
-
-
Thursday 13.10. 12-14 Exercise session
- Exercise sheet (Sneak peek in HTML)
- Deadline 20.10.
-
17.10.-21.10. Guest lectures
-
Tuesday 18.10. 12-14
Sampsa Hautaniemi: Cancer, Genomics and Networks
Andre S. Ribeiro: Modelling, Analysis, and Validation of a Gene Regulatory Mechanism -
Thursday 20.10. 10-12
Mikko Arvas: Metabolic modelling for industrial biotechnology
Teemu Kivioja: Computational challenges in genome-scale measurements of gene expression and its regulation
-
Tuesday 18.10. 12-14
Exercises
You can work on the exercises with a pair or alone. Submit your solutions as an ipynb file using Moodle.
The exercises consists of small programming projects in Python. We will use Python version 3 for the exercises. Exercises are given as Jupyter Notebook documents that you should complete to include your solutions. Moodle has instructions on how to use the Jupyter Notebook environment on CS department Linux workstations and you can also install it on your own computer. To get started with the exercises:
- Create a directory for your notebooks
- Copy the exercise file into that directory
- Open a terminal and move to the directory
- Run 'jupyter notebook'
This will start the Jupyter Notebook system and open a web browser for you in which you can start working on the exercises. When you are done, close the web browser and issue Ctr-C twice in the terminal window to shutdown the environment.
Grading
To pass the course:
- Attend study groups and visiting lectures
- Submit the programming exercises and get at least 6 points in each of the three exercise sets (probabilistic analysis of sequence-levels problems, network models, network inference)
The course will be graded in the scale 1-5. Grading is based on the submitted programming exercises. In total 60 points will be available. To pass the course you must get at least 30 points and a grade of 5 will require 50 points. If the exercises prove to be very difficult, these limits may be lowered.
The course does not include an exam and it is not possible to pass the course with a separate exam.