Suomeksi På svenska In English
University of Helsinki Department of Computer Science
 

Annual report 2006

Research projects

Intelligent systems

Statistical Multilingual Analysis for Retrieval and Translation (SMART)

Period: 10/2006-9/2009
Researchers: Juho Rousu, Wray Buntine, Matti Kääriäinen, Huizhen Yu, Kimmo Valtonen, Ville Tuulos, Antti Tuominen, Matti Vuorinen
Funding: EU

The goal of this project is to develop new methods of statistics and machine learning for multilingual information retrieval and machine translation. The main focuses of machine translation are to update translation models automatically on the basis of user response and to improve the fluency of rough translations. In multilingual information retrieval, the discovery of latent, language-independent features is especially interesting. The academic partners of the project are the universities of Southampton and Bristol , University College London, Università degli Studi di Milano, Josef Stefan Institute and National Research Council Canada.

Probabilistic Methods for Microarray Data Analysis (PMMA)

Period: 1/2004-12/2007
Researchers: Petri Myllymäki, Jorma Rissanen, Teemu Roos, Hannes Wettig, Jussi Lahtinen, Tomi Silander
Funding: Tekes

The goal of this project is to develop new probabilistic methods for analysis of microarraydata. The research especially focuses on the following sub-areas: removing noise from micro chip images, developing compression estimations (comprestimation) methods, grouping and classification of genes, building gene regulation networks and evaluation of results. The research consortium consists of three participant groups: the Laboratory for Computational Engineering, Helsinki University of Technology (in the charge of Jukka Heinonen, DSc (Tech)), the Institute of Biomedicine , University of Helsinki (in the charge of Professor Tomi Mäkelä) and Department of Computer Science (in the charge of Professor Petri Myllymäki).

In 2006, the project developed an algorithm for finding the optimal Bayes network; it is suitable for cases with 30 or less variables. The empirical testing of the algorithm is still going on In addition, the project has studied methods for parallelizing Bayesian network learning algorithms.

MDL-Based Methods for Image Denoising (KUKOT)

Period: 1/2006-12/2007
Researchers: Petri Myllymäki, Jorma Rissanen, Teemu Roos, Hannes Wettig, Petri Kontkanen, Tommi Mononen
Funding: Tekes

We can consider digital bit streams processed in the ICT sector as consisting of two overlapping parts, where one part is useful information and the other is useless noise. There is noise in all digital media; it is generated by the faults in original information sources (such aspoor image resolution) and errors in signal transmission (such as disruptions in wireless communications or faults in hard drives). Noise can be filtered if the features of the source are known (in some degree at least), but it is very difficult to build general methods for denoising since they have to be able to construct adaptive models of random noise sources. The main problem with such adaptive modelling is the regularization of models; too complex (over-adaptive) models will interpret noise as part of the information and thus be rendered useless.

MDL (Minimum Description Length) is an information-theoretical framework developed by the father of arithmetic encoding, Jorma Rissanen. It provides an elegant solution for this problem. Unfortunately, the methods based on the MDL theory are often very challenging computationally. Based on the latest results of the MDL theory, the project has collaborated with Jorma Rissanen to develop new, computationally efficient general denoising components for the processing of image signals. The results can be implemented either for more efficient compression of signals, leading to more efficient transmission of images, or for enhancing the quality of received signals without adding too much to the amount of digital information being sent. The functionality of the methods developed in the project will be tested on material provided by cooperation partners and on public material. The research consortium consists of two sub-groups: the Department of Computer Science at the University of Helsinki (in the charge of Petri Myllymäki) and the Laboratory of Computational Technology at Helsinki University of Technology (in the charge of Jukka Heikkonen DSc (Tech)). Additional information: http://ww.mdl-research.org

Search-Ina-Box (SIB)

Period: 3/2003-6/2007
Researchers: Petri Myllymäki, Wray Buntine, Jussi Lahtinen, Jaakko Löfström, Jukka Perkiö, Vladimir Poroshin, Antti Tuominen, Ville Tuulos, Kimmo Valtonen
Funding: Tekes, National Board of Patents and Registration, Nokia, Wisane, M-Brain

The SIB project develops mutually supportive, next-generation methods for semantic information retrieval and personification based on automatic analysis of tera- and peta-scale information sources.These methods have been integrated to a set of prototypes that are tested in different pilot environments, such as corporate information-management systems, topic-based searchengines, analysis of news articles,, and public intelligent search engines. Since information retrieval will be a mainstay of future information networks, thepotential applications of the SIB technology are numerous..

The methods developed in the SIB project comprise the basic technology in future web-based information-management systems, both in the internal information networks of companies and in open systems providing Internet information (such as Internet search engines). There are three partners in the research consortium: the Department of Computer Science at the University of Helsinki /the Helsinki Institute for Information Technology HIIT (Professor Petri Myllymäki), the Department of Computer Sciences at the University of Tampere (Professor Kari-Jouko Räihä), and the Department of Health Policy and Management at the University of Kuopio (Professor Olli-Pekka Ryynänen). Additional information: http://cosco.hiit.fi/search/

Scalable Probabilistic Methods for the Next Generation Search Engine (PROSE)

Period: 1/2003-12/2006
Researchers: Petri Myllymäki, Wray Buntine, Jussi Lahtinen, Jaakko Löfström, Jukka Perkiö, Vladimir Poroshin, Antti Tuominen, Ville Tuulos, Kimmo Valtonen
Funding: The Academy of Finland

The aim of the project is to study modern computational statistical methods necessary for developing next-generation internet search engines, as well as scalable efficient implementations of them. The work focuses on developing statistical modelling techniques such as multinomial principal component analysis (mPCA).

In addition to theoretical and analytical research and development of methods, the project has studied how well suited the methods are for very large document collections (in the giga- and tera-byte class). Such methods are necessary for implementing the more advanced features of search engines, such as multi-class grouping, forming topic hierarchies in document bodies automatically, and intelligent query distribution to search engine clusters that specialize in different topic areas. In addition to the basic method research, the project developed software libraries based on so-called open-source libraries of scientific computation. The software libraries developed in the project can be utilised for the efficient implementation of various functions in the core concept map of search engine clusters. . Additional information: http://cosco.hiit.fi/search/

 

Cognitively Inspired Visual Interfaces for Representing Multidimensional Information (CIVI)

Period: 1/2005-12/2008
Researchers: Petri Myllymäki, Jussi Lahtinen, Petri Kontkanen, Pekka Uronen

Funding: The Academy of Finland

The CIVI project studies how to visualize the multidimensional information that is available to everyone through e.g. different search engines. On the one hand, the question is studied as a mathematical dimension reduction problem, on the other, as a challenge in perceptual psychology. This inter-disciplinary research is carried out in a two-university consortium, comprising the Cosco group at the University of Helsinki , led by Professor Petri Myllymäki, and Docent Ilpo Kojo's research group at the CKIR unit at Helsinki School of Economics.

 

Advanced data analysis in vision research

Period: 01/2004-12/2006

Researchers: Aapo Hyvärinen, Ilmari Kurki

Funding: The Academy of Finland

We develop new ways of analysing data measured on the performance of the human visual system. Our approach is based on a recently developed experimental paradigm, so-called classification images. This is a co-operation with the Helsinki University Department of Psychology.

Statistical modelling of image and video data

Period: 04/2003- 12/2009

Researchers : Aapo Hyvärinen, Jarmo Hurri, Mika Inki, Urs Köster, Jussi Lindgren

Funding: HIIT/BRU, The Academy of Finland, HeCSE, an international foundation

We develop new statistical models of image and video data. The models are useful forresearch on both human and computer vision (incl. image processing)

. In 2006, we mainly developed models of the statistical characteristics of quadratic features. At the beginning of 2006, the consortium XtraVision, funded by the Academy of Finland neuroscience research programme, started its operations. We cooperated in it with experimental neuroscience researchers. Aapo Hyvärinen is the leader of the consortium.

Non-Gaussian Bayesian networks for causal discovery

Period: 01/2005-12/2007

Researchers: Patrik Hoyer, Aapo Hyvärinen, Antti Kerminen, Markus Palviainen, Shohei Shimizu

Funding: HIIT/BRU, The Academy of Finland, an international foundation

The goal of statistical data analysis is often to find causal relations between observed variables. However, traditional statistical methods cannot usually analyse causal relations. Lately, different methods that are assumed to be able to discover causal relations have been developed on the basis of the analysis of the effects of hypothetical interventions. In this project, we strive to develop new methods for causal analysis by combining two families of methods: Bayesian networks andindependent component analysis

.