Esther Galbrun defends her PhD thesis on December 4th, 2013 on Methods for Redescription Mining

Ingénieur diplômée Esther Galbrun will defend her doctoral thesis Methods for Redescription Mining  on Wednesday 4th of December 2013 at 12 o'clock in the University of Helsinki Main Building, Unioninkatu 34, Auditorium XV  (old part), 3rd floor.  Her opponent is Professor Nada Lavrač (Jožef Stefan Institute, Slovenia) and custos Professor Hannu Toivonen (University of Helsinki). The defense will be held in English.

Methods for Redescription Mining

In scientific investigations data oftentimes have different nature. For instance, they might originate from distinct sources or be cast over separate terminologies. In order to gain insight into the phenomenon of interest, a natural task is to identify the correspondences that exist between these different aspects.

This is the motivating idea of redescription mining, the data analysis task studied in this thesis. Redescription mining aims to find distinct common characterizations of the same objects and, vice versa, to identify sets of objects that admit multiple shared descriptions.

A practical example in biology consists in finding geographical areas that admit two characterizations, one in terms of their climatic profile and one in terms of the occupying species. Discovering such redescriptions can contribute to better our understanding of the influence of climate over species distribution. Besides biology, applications of redescription mining can be envisaged in medicine or sociology, among other fields.

Previously, redescription mining was restricted to propositional queries over Boolean attributes. However, many conditions, like aforementioned climate, cannot be expressed naturally in this limited formalism. In this thesis, we consider more general query languages and propose algorithms to find the corresponding redescriptions, making the task relevant to a broader range of domains and problems.

Specifically, we start by extending redescription mining to non-Boolean attributes. In other words, we propose an algorithm to handle nominal and real-valued attributes natively. We then extend redescription mining to the relational setting, where the aim is to find corresponding connection patterns that relate almost the same object tuples in a network.

We also study approaches for selecting high quality redescriptions to be output by the mining process. The first approach relies on an interface for mining and visualizing redescriptions interactively and allows the analyst to tailor the selection of results to meet his needs. The second approach, rooted in information theory, is a compression-based method for mining small sets of associations from two-view datasets.

In summary, we take redescription mining outside the Boolean world and show its potential as a powerful exploratory method relevant in a broad range of domains.

Availability of the dissertation

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-952-10-9431-6.

Printed copies are available on request from Esther Galbrun: tel. +358 (0)9 19151239 or esther.galbrun@cs.helsinki.fi.

20.06.2014 - 04:27 Esther Galbrun
25.11.2013 - 17:16 Pirjo Moen