Suomeksi På svenska In English
University of Helsinki Department of Computer Science
 

Annual report 2005

Complex Systems Computation Research Group – CoSCo

The CoSCo research group studies computational problems in complex systems, especially regarding prediction and modelling. Its research areas include stochastic modelling and data analysis, Bayesian networks and related probabilistic model families – such as finite mixture models, Bayesian multi-nets and discrete main component analysis – information-theoretical approaches to inference (MDL) and stochastic optimisation algorithms like simulated annealing and genetic algorithms.

The work has a strong basic research component, being at the intersection of computer science, information theory and mathematical statistics. In addition, the research works towards a strong application component. The theoretical results on methods are applied in many fields, such as social sciences, criminology, ecology, medicine and industrial applications.

Recent research has focused on personalisation of the internet, diagnostics for space satellites, next-generation search engine techniques and modelling for location-aware services.

The members of the group possess varied abilities, from theoretical research to excellent programming skills. To name one concrete example of the group's broad field of expertise, we can mention the unique B-Course data analysis server (http://b-course.hiit.fi) developed and maintained by the group. It applies the latest research results from the field of probability modelling. During its three years of existence, the server has been accessed by over 15,000 users world-wide, and the results from the analysis service have been used e.g. for the development of a vaccine against HIV, analysing birdsong and studying gene data.

In 2004, next-generation search engines became one of the most important focuses of the group's work (please see http://cosco.hiit.fi/search). CoSCo looks to become an important international operator in the development of open-source code in this field, and is the co-ordinator of a large EU project that started in the area in 2004 (Alvis, please see http://cosco.hiit.fi/search/alvis.html).

The more theoretical aspects of the research group are represented by the www.mdl-research.org server that the group continued to maintain in 2005. This website endeavours to collect the main results of research on the Minimum Description Length (MDL) theory developed by Jorma Rissanen in one place. Rissanen also co-operates actively with the group.

CoSCo works as part of HIIT, and at the recent evaluation of the scientific board of HIIT, the group was pronounced to be on the cutting edge internationally. The group has excellent international research contacts to the leading groups working on probabilistic modelling and co-operates with many top researchers overseas.

Contact person: Professor Petri Myllymäki

Homepage: http://cosco.hiit.fi/

Publications

Roos T. & Wettig H. & Grünwald P. & Myllymäki P. & Tirri, H.
On Discriminative Bayesian Network Classifiers and Logistic Regression. Machine Learning 59:3, pp. 267-296.

Kontkanen P. & Myllymäki P. & Buntine W. & Rissanen J. & Tirri, H.
An MDL Framework for Data Clustering. In Advances in Minimum Description Length: Theory and Applications, edited by P. Grünwald, I.J. Myung and M. Pitt. The MIT Press, 2005.

Buntine W.
Open Source Search: A Data Mining Platform. SIGIR Forum, June 2005.

Perkiö J & Tuulos V. & Buntine W. & Tirri, H.
Multi-Faceted Information Retrieval System for Large Scale Email Archives. In Proceedings of the IEEE/WIC/ACM Conference on Web Intelligence (WI 2005).

Miettinen M & Nokelainen P. & Kurhila J. & Silander T. & Tirri, H.
EDUFORM - A Tool for Creating Adaptive Questionnaires. International Journal on E-Learning, Vol. 4 (2005), No. 3, 365-373.

Research projects

  • Probabilistic Methods for Microchip-data Analysis (PMMA)
  • Proactive Information Retrieval by Adaptive Models of Users' Attention and Interests (Prima)
  • Minimum Description Length Modeling in Computer Science and Statistics (Minos)
  • Scalable Probabilistic Methods for Next Generation Internet Search Engines (Prose)
  • Superpeer Semantic Search Engine (Alvis)
  • Search-Ina-Box (SIB)
  • Cognitively Inspired Visual Interfaces for Representing Multidimensional Information (CIVI)
  • Probabilistiset menetelmät mikrosirudata-analyysissä (Probabilistic Methods for Microchip-data Analysis, PMMA)

International visits

To the group
Guo Hang, Tsinghua University, China, 1 November 2004 – 30 April 2005
Ramakrishna Thanniru, IIT Guwarati, India, 1 May – 31 July 2005
Simon Lacoste-Julien, UC Berkeley, USA, 7 August – 27 August 2005

From the group
Vladimir Poroshin: Tsinghua University , China , 1 September – 4 October 2005

Teemu Roos: CWI Institute, the Netherlands , 28 February – 26 April 2005