Discovery group: Data Mining for Pattern and Link Discovery



Department of Computer Science

Finnish Centre of Excellence for Algorithmic Data Analysis Research

Helsinki Institute for Information Technology HIIT


Research Topics

Here is a (slightly outdated) sample of our research topics. See the Projects page for separately funded research projects.

  Elot sai karkelojen teitä,
  lumi ajan kotia,
  hiljaa soi kodit autiot,
  hiljaa sai armaat karkelot -
  laiho sai lumien riemut.
 Lives got the frolic ways,
  snow the home of time,
  softly chimed abandoned homes,
  softly got frolics beloved -
  ripening crop got the snows' joys.

Computational poetry is a challenging research topic in the area of artificial intelligence, natural language processing, and cognitive science. Our main focus has been in developing computational methods that could be used to produce automatically creative texts regarded as poems with minimal amount of linguistic rule specification.

An example poem in Finnish, with its translation into English, is given on the side.

Computational humor is a new area of computer science addressing methods and tools for the study and the simulation of humor. Our research is focused on the technological and generative aspects of humor. In other words, we explore ways to apply state-of-the-art artificial intelligence and computational linguistics in order to make people laugh.

We investigate possible uses of data mining for detecting linguistic ambiguity and then exploiting it to induce surprise. Moreover, we want to analyze events in which computer mistakes are perceived as ridiculous (e.g. unintentional funny autocomplete) and use them to achieve forms of intentional humor. Finally, we are investigating strategies for increasing humorous expressivity through the combination of kinetic typography and creative coding.

Novelty detection in texts and graphs allows one to find new, rare, or exceptional information. It also has applications in computational creativity. Our goal is to develop relatively language-independent methods to find novel and/or creative associations in free text. The general idea is to turn a big corpus of free text into a concept graph and then develop methods which can infer or extract something interesting from that network.

Graph mining methods that discover useful information from graphs. Specifically, how to find sets of interesting entities that are worth to explore by the user? This includes many challenging and important problems. Consider, as toy example, the network below and a user who wants to know how Barcelona and Helsinki are related. We address the problem by identifying other nodes that are relevant with respect to the query nodes, but non-redundant with respect to each other.

Network abstraction is motivated by the growth of networks in many areas of life. The goal is to summarize a large network into a smaller one, which is more useful for visualization, analysis and understanding. Our work differs from the mainstream graph mining and analysis: whereas typical graph mining focuses on finding patterns or communities in the networks, we investigate methods to make networks simpler. We develop novel methods to solve some instances of the problem, and with applications in bioinformatics.

Redescription mining is a powerful data analysis method which aims at finding multiple descriptions of the same entities. An example is to characterize geographical regions by the fauna that inhabits them on one hand and by their meteorological conditions on the other hand. This is a task of much importance in biology known as niche-finding. We study redescription mining in various settings, extending the problem definition and designing new algorithms.








22.12.2016 - 17:22 Hannu Toivonen
02.05.2012 - 22:53 Hannu Toivonen
02.05.2012 - 22:53 Toivonen