Lauri Eronen defends his PhD thesis on September 20th, 2013 on Computational methods for augmenting association-based gene mapping

M.Sc. Lauri Eronen will defend his doctoral thesis Computational methods for augmenting association-based gene mapping on Friday 20th of September 2013 at noon in the University of Helsinki Main Building, Unioninkatu 34, Auditorium XII (old part), 3rd floor. His opponent is Professor Joost Kok (Leiden University, The Netherlands) and custos Professor Hannu Toivonen (University of Helsinki). The defense will be held in English.

Computational methods for augmenting association-based gene mapping

The context and motivation for this thesis is gene mapping, the discovery of genetic variants that affect susceptibility to disease. The goals of gene mapping research include understanding of disease mechanisms, evaluating individual disease risks and ultimately developing new medicines and treatments.

Traditional genetic association mapping methods test each measured genetic variant independently for association with the disease. One way to improve the power of detecting disease-affecting variants is to base the tests on haplotypes, strings of adjacent variants that are inherited together, instead of individual variants. To enable haplotype analyses in large-scale association studies, this thesis introduces two novel statistical models and gives an efficient algorithm for haplotype reconstruction, jointly called HaloRec. HaploRec is based on modeling local regularities of variable length in the haplotypes of the studied population and using the obtained model to statistically reconstruct the most probable haplotypes for each studied individual. Our experiments demonstrate that HaploRec is especially well suited to data sets with a large number or markers and subjects, such as those typically used in currently popular genome-wide association studies.

Public biological databases contain large amounts of data that can help in determining the relevance of putative associations. In this thesis, we introduce Biomine, a database and search engine that integrates data from several such databases under a uniform graph representation. The graph database is used to derive a general proximity measure for biological entities represented as graph nodes, based on a novel scheme of weighting individual graph edges based on their informativeness and type. The resulting proximity measure can be used as a basis for various data analysis tasks, such as ranking putative disease genes and visualization of gene relationships.

Our experiments show that relevant disease genes can be identified from among the putative ones with a reasonable accuracy using Biomine. Best accuracy is obtained when a pre-known reference set of disease genes is available, but experiments using a novel clustering-based method demonstrate that putative disease genes can also be ranked without a reference set under suitable conditions.

An important complementary use of Biomine is the search and visualization of indirect relationships between graph nodes, which can be used e.g. to characterize the relationship of putative disease genes to already known disease genes. We provide two methods for selecting subgraphs to be visualized: one based on weights of the edges on the paths connecting query nodes, and one based on using context free grammars to define the types of paths to be displayed. Both of these query interfaces to Biomine are available online.

Availability of the dissertation

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-952-10-9178-0.

Printed copies are available on request from Lauri Eronen: lauri.eronen@cs.helsinki.fi.

02.09.2013 - 14:10 Pirjo Moen
02.09.2013 - 14:10 Pirjo Moen