Laura Langohr defends her PhD thesis on Methods for Finding Interesting Nodes in Weighted Graphs on June 30th, 2014

Diplom-Bionformatikerin Laura Langohr will defend her doctoral thesis Methods for Finding Interesting Nodes in Weighted Graphs on Monday the 30th of June 2014 at noon in the University of Helsinki Exactum Building, Auditorium CK112 (Gustaf Hällströminkatu 2b). Her opponent is Professor Dino Pedreschi (University of Pisa, Italy) and custos Professor Hannu Toivonen (University of Helsinki). The defense will be held in English.

Methods for Finding Interesting Nodes in Weighted Graphs

With the increasing amount of graph-structured data available, finding interesting objects, i.e., nodes in graphs, becomes more and more important. In this thesis we focus on finding interesting nodes and sets of nodes in graphs or networks. We propose several definitions of node interestingness as well as different methods to find such nodes.

Specifically, we propose to consider nodes as interesting based on their relevance and non-redundancy or representativeness w.r.t. the graph topology, as well as based on their characterisation for a class, such as a given node attribute value. Identifying nodes that are relevant, but non-redundant to each other is motivated by the need to get an overview of different pieces of information related to a set of given nodes. Finding representative nodes is of interest, e.g. when the user needs or wants to select a few nodes that abstract the large set of nodes. Discovering nodes characteristic for a class helps to understand the causes behind that class.

Next, four methods are proposed to find a representative set of interesting nodes. The first one incrementally picks one interesting node after another. The second iteratively changes the set of nodes to improve its overall interestingness. The third method clusters nodes and picks a medoid node as a representative for each cluster. Finally, the fourth method contrasts diverse sets of nodes in order to select nodes characteristic for their class, even if the classes are not identical across the selected nodes. The first three methods are relatively simple and are based on the graph topology and a similarity or distance function for nodes. For the second and third, the user needs to specify one parameter, either an initial set of k nodes or k, the size of the set. The fourth method assumes attributes and class attributes for each node, a class-related interesting measure, and possible sets of nodes which the user wants to contrast, such as sets of nodes that represent different time points. All four methods are flexible and generic. They can, in principle, be applied on any weighted graph or network regardless of what nodes, edges, weights, or attributes represent.

Application areas for the methods developed in this thesis include word co-occurrence networks, biological networks, social networks, data traffic networks, and the World Wide Web. As an illustrating example, consider a word co-occurrence network. There, finding terms (nodes in the graph) that are relevant to some given nodes, e.g. branch and root, may help to identify different, shared contexts such as botanics, mathematics, and linguistics. A real life application lies in biology where finding nodes (biological entities, e.g. biological processes or pathways) that are relevant to other, given nodes (e.g. some genes or proteins) may help in identifying biological mechanisms that are possibly shared by both the genes and proteins.

Availability of the dissertation

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-952-10-9977-9.

Printed copies are available on request from Laura Langohr: laura.langohr@cs.helsinki.fi or +358 2941 51303.

11.06.2014 - 13:04 Pirjo Moen
11.06.2014 - 13:04 Pirjo Moen