Jussi Määttä defends his PhD thesis on Model Selection Methods for Linear Regression and Phylogenetic Reconstruction on May 27th, 2016

M.Sc. Jussi Määttä will defend his doctoral thesis Model Selection Methods for Linear Regression and Phylogenetic Reconstruction on Friday the 27th of May 2016 at 14 o'clock in the University of Hesinki Exactum building, Auditorium B123 (Gustaf Hällströminkatu 2b). His opponent is Professor Ivo Grosse (Martin Luther University of Halle-Wittenberg, Germany) and custos Professor Petri Myllymäki (University of Helsinki). The defence will be held in English.

Model Selection Methods for Linear Regression and Phylogenetic Reconstruction

Model selection is the task of selecting from a collection of alternative explanations (often probabilistic models) the one that is best suited for a given data set. This thesis studies model selection methods for two domains, linear regression and phylogenetic reconstruction, focusing particularly on situations where the amount of data available is either small or very large.

In linear regression, the thesis concentrates on sequential methods for selecting a subset of the variables present in the data. A major result presented in the thesis is a proof that the Sequentially Normalized Least Squares (SNLS) method is consistent, that is, if the correct answer (i.e., the so-called true model) exists, then the method will find it with probability that approaches one as the amount of data increases. The thesis also introduces a new sequential model selection method that is an intermediate form between SNLS and the Predictive Least Squares (PLS) method. In addition, the thesis shows how these methods may be used to enhance a novel algorithm for removing noise from images.

For phylogenetic reconstruction, that is, the task of inferring ancestral relations from genetic data, the thesis concentrates on the Maximum Parsimony (MP) approach that tries to find the phylogeny (family tree) which minimizes the number of evolutionary changes required. The thesis provides values for various numerical indicators that can be used to assess how much confidence may be put in the phylogeny reconstructed by MP in various situations where the amount of data is small. These values were obtained by large-scale simulations and they highlight the fact that the vast number of possible phylogenies necessitates a sufficiently large data set. The thesis also extends the so-called skewness test, which is closely related to MP and can be used to reject the hypothesis that a data set is random, possibly indicating the presence of phylogenetic structure.

Availability of the dissertation

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-2150-9.

Printed copies will be available on request from Jussi Määttä: jussi.maatta@helsinki.fi.

Photographer: Veikko Somerpuro

30.09.2016 - 13:28 Pirjo Moen
09.05.2016 - 15:42 Pirjo Moen