Computational Genotype Analysis

Syventävät opinnot
We will study statistical and algorithmic methods for the analysis of genetic variation in SNP (single nucleotide polymorphism) genotype data. Topics include measures of linkage disequilibrium, haplotype inference, haplotype block discovery, and detection of large-scale structural variation. Prerequisites: basics of genetics and statistics.
Vuosi Lukukausi Päivämäärä Periodi Kieli Vastuuhenkilö
2010 syksy 01.11-08.12. 2-2 Englanti


Aika Huone Luennoija Päivämäärä
Ma 10-12 B222 Mikko Koivisto 01.11.2010-08.12.2010
Ke 10-12 B222 Mikko Koivisto 01.11.2010-08.12.2010


Group: 1
Aika Huone Ohjaaja Päivämäärä Huomioitavaa
Ti 14-16 C222 Mikko Koivisto 08.11.2010—10.12.2010

Registration for this course starts on Tuesday 12th of October at 9.00. The lectures and exercises start on week 45 (8.11.-12.11.).

Information for international students

This course will be given in English if there is one or more attendees that do not speak Finnish.


The course teaches the student how data analysis tasks in the field can be formalized and analyzed under the following design pattern:

Phenomenon -> Data -> Model -> Question -> Algorithm.

In our case, the form of the data is rather fixed, namely to genotype data collected at some number of individuals at some number of genetic markers known as SNPs. Several examples of the pattern are considered in the context of various illustrative data analysis tasks; see below for the topis, the schedule of the lectures, and the material.

Note: There are important related and relevant topics not covered by the course, most notably phylogenetics, simulation of genotype data, estimation of recombination rates, inference of population structure. Some of these topics are covered by other courses in the Master's Degree Programme in Bioinformatics.

Kirjallisuus ja materiaali

The 2nd take home exam can be downloaded here. The deadline for submitting your answers is Mar 14, 2011, 11:59 pm.

The 1st take home exam can  be downloaded here; for solutions click here.

A tentative plan for the lectures is as follows. Click on the title to download the week's lecture note, including the exercises for the next week's session. For information about passing and grading see the notes of Week 44.

Week 44: Introduction

Week 45: Measuring Linkage Disequilibrium

Week 46: Haplotype Inference

Week 47: Haplotype Block Discovery 

Week 48: Detecting Deletion Polymorphisms

Week 49: Detecting Inversion Polymorphisms  (No lecture notes; see the papers by Sindi & Raphael and by Stefansson et al., linked below.)  

Solutions to the exercises will be accumulated here (now Weeks II-V). 


Clark A, Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol  7, 111–122 (1990)

Corona E, Raphael B, Eskin E, Identification of deletion polymorphisms from haplotypes. Proc. of RECOMB'07, LNCS 4453, 354 - 365 (2007)

Gabriel SB et al., The structure of haplotype blocks in the human genome. Science 296, 2225 - 2229 (2002)

Gusfield D, Inference of haplotypes from samples of diploid populations: complexity and algorithms. J Comput Biol  8, (2001)

Kidd JM et al., Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56-64 (2008)

Kollin J, Computational Methods for Detecting Large-Scale Chromosome Rearrangements in SNP Data. (PhD thesis) Report A-2010-3, Dept of CS, University of Helsinki, 2010.

McCarroll SA et al., Common deletion polymorphisms in the human genome. Nature Genetics 38, 86 - 92 (2006) 

O'Reilly et al., invertFREGENE: software for simulating inversions in population genetic data. Bioinformatics 26, 838-840 (2010)

Patil N et al., Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001)

Sindi SS, Raphael BJ, Identification and frequency estimation of inversion polymorphisms from haplotype data. J Comput Biol 17, 517  531 (2010)

Stefansson H et al., A common inversion under selection in Europeans. Nature Genetics 37, 129 - 137 (2005)

Wang N et al., Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am J Hum Genet  71, 1227 - 1234 (2002)

Zhang K et al., HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms. Bioinformatics 19, 1300 - 1301

Weir B, Inferences about linkage disequilibrium. Biometrics 35, 235 - 254 (1979)