Suomeksi På svenska In English
University of Helsinki Department of Computer Science
 

Annual report 2006

Discovery group: Data Mining for Pattern and Link Discovery

We develop novel methods and tools for pattern and link discovery. Our focus is on structured and heterogeneous data, such as graphs, and also on sequences. The main applications are in bioinformatics, genetics, and in ubiquitous computing, in tight collaboration with scientists and companies in these application areas.

Our current emphasis is on analysis and link discovery in weighted (biological) graphs (Biomine project). In sequence analysis, we recently introduced (variable length) Markov models to the problem of reconstructing haplotype strings. We have also developed novel concepts and methods for gene mapping, for instance, based on discovery of genetically motivated tree-structured patterns. These methods are applied to real problems with our partners, and they have turned out to be very useful in modern high-throughput medical genetics. In context-sensitive computation, the group developed the ContextPhone software that is widely used in research institutions all over the world.

The group works jointly under the Department of Computer Science at University of Helsinki and The Helsinki Institute for Information Technology (HIIT). We constitute a part of the national Centre of Excellence From Data to Knowledge (FDK).

Contact person: Professor Hannu Toivonen

Website: http://www.cs.helsinki.fi/research/discovery

Projects

Biomine
Context

Publications

Eronen, L. & Geerts, F. & Toivonen, H.: HaploRec: efficient and accurate large-scale reconstruction of haplotypes. BMC bioinformatics. London : BioMed Central. 7 (2006) : 542, 38 p..

Hintsanen, P. & Sevon, P. & Onkamo, P. & Eronen, L. & Toivonen, H.: An empirical comparison of case-control and trio based study designs in high throughput association. Journal of medical genetics. London : British Medical Association. 43 (2006), 617-624.

Muhonen, J. & Toivonen, H.: Closed non-derivable itemsets PKDD 2006 : European Conference on Principles and Practice of Knowledge Discovery in Databases: Knowledge discovery in databases. - Berlin : Springer 2006. p. 601-608.

Sevon, P. & Eronen, L. & Hintsanen, P. & Kulovesi, K. & Toivonen, H.: Link discovery in graphs derived from biological databases. DILS 2006: Data integration in the life sciences. - Berlin : Springer 2006. p. 35-49.

Sevon, P. & Toivonen, H. & Ollikainen, V.: TreeDT : tree pattern mining for gene mapping. IEEE/ACM transactions on computational biology and bioinformatics. New York (NY) : IEEE. 3 (2006) : 2, p. 174-185.