Simulation and graph mining tools for improving gene mapping efficiency

Event type: 
Defence of thesis
Doctoral dissertation
MSc Petteri Hintsanen
Professor Michael Berthold, Universität Konstanz
Professor Hannu Toivonen, Helsingin yliopisto
Event time: 
30.09.2011 - 12:00 - 16:00
University of Helsinki, Main Building, Auditorium XIV, Unioninkatu 34

MSc Petteri Hintsanen will defend his thesis "Simulation and graph mining tools for improving gene mapping efficiency".  Professori Hannu Toivonen will act as custos and Professori Michael Berthold, Universität Konstanz, as opponent.


Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects.

In the second part of the thesis, we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases,  from this unified schema by graph mining techniques.

Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs.  The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.  

06.09.2011 - 16:07 Pirjo Moen
05.09.2011 - 12:37 Marina Kurtén