University of Helsinki Department of Computer Science

Department of Computer Science

Department information


Discovery group: Data Mining for Pattern and Link Discovery

This is our (very) old web page. The new page is at

We develop novel methods and tools for pattern and link discovery. Our focus is on structured and heterogeneous data, such as graphs, and also on sequences. The importance of data mining in heterogeneous and structured data will only grow in the future. There will be an increasing amount of challenging and important problems, especially in scientific applications. Our current main applications are in bioinformatics, in collaboration with applied scientists and companies.

Our research topics are motivated by novel problems in applications. Our current emphasis is on analysis and link discovery in weighted (biological) graphs (Biomine project). We identify computational problems in them, develop new algorithms, and apply them. While we value fielded applications with an impact, we also emphasize solid, application independent methods and results. We recently introduced (variable length) Markov models to the problem of reconstructing haplotype strings [BMC Bioinformatics, software]. We have developed novel concepts and methods for gene mapping, for instance, based on discovery of genetically motivated tree-structured patterns [EEE/ACM Transactions on Computational Biology and Bioinformatics, American Journal of Human Genetics, software]. These methods have turned out to be very useful in the practice of medical genetics. In context- sensitive computation, the group developed the ContextPhone software that is in wide use in several research institutions all over the world [IEEE Pervasive Computing, software].

The group works jointly under the Department of Computer Science at University of Helsinki, and The Helsinki Institute for Information Technology (HIIT). We constitute a part of Algodan (Algorithmic Data Analysis), a national Centre of Excellence in research.



This is our old web page. The new page containing this information is at

Biomine: A biological search engine
We view biological databases of sequences, proteins, genes etc. as weighted graphs and develop methods for link discovery and analysis in such graphs. Try out the prototype search engine at! (Funding: National Technology Agency (Tekes) and companies.)

Bison: Bisociation Networks for Creative Information Discovery
The aim is to develop and validate a novel computational methodology, which facilitates bisociative information discovery in large-scale heterogeneous information environments. (Funding: European Commission under the Framework 7 programme; on-going.)

Context: Context Recognition by User Situation Data Analysis
The Context project studies characterization and analysis of information about user's context and its use in proactive adaptivity. We have developed data analysis algorithms as well as ContextPhone, a mobile context-aware prototyping platform, available as free software. (Funding: Academy of Finland, PROACT Programme; formally finished, work continues with internal funding.)

Data mining in genetics
We develop models, methods and tools for analyzing genetic data, in particular for gene mapping and haplotype analysis. (Funding: National Technology Agency (Tekes) and companies, HIIT; formally finished, work continues with internal funding.)


This is our old web page. The new page containing this information is at

Biomine search engine
A prototype of an associative search engine for biological information, integrated from multiple public sources and implemented using techniques developed in the project.

HaploRec - Haplotype reconstruction
Scalable software for population-based haplotype phasing, especially for sparse marker maps.

ContextPhone - Context-aware platform for mobile phones
ContextPhone is an open software platform for context-aware applications. It can be used to collect, analyze and transmit information about its context, as well as to tag and publish contextual media.

HPM and TreeDT - Gene mapping methods
Software for association analysis, i.e., gene mapping based on linkage disequilibrium

AsVis - Visualization of association rules in SNP neighborhoods
A web application, available as source code, for visualizing association rules obtained from short, sequential data, such as SNP neighborhoods (on-line demo)

Bassist - MCMC simulation for Bayesian statistical models
Bassist is a tool that automates the use of hierarchical Bayesian models in complex analysis tasks, by generating a model-specific MCMC sampler. Bassist is not supported any more.


This is our old web page. The new page containing this information is at

Group leader


PhD students

Past visitors and postdocs



This is our old web page. The new page containing this information is at


Contact: Prof. Hannu Toivonen, email