next up previous contents
Next: Document Management (DocMan) Up: c) Information Systems Previous: c) Information Systems

Data Mining

Data mining (or knowledge discovery in databases) is a new research area developing methods and systems for extracting interesting and useful information from large sets of data. Data mining methods can be used in a variety of application areas, such as commercial databases, telecommunications, epidemiological data, etc. The area combines techniques from databases, statistics, and machine learning.

The Data Mining research group has developed data mining methods and studied the theory of data mining. The research started in late 1980's in the context of developing tools for inferring integrity constraints from databases.

We have developed methods for finding recurrent episodes in event sequences, and used these to locate strong rules about the occurrences of events. Clustering methods have also been applied to locate regularities in sequential data. For numerical time-series data we have developed methods that are able to discover similarities in various aspects of potentially related time-series.

Data mining can produce large amounts of new information. We are working on the data mining process as a whole and on the selection of the interesting regularities in particular. In connection with the Document Management group, we have considered these issues in the analysis of text and structure in marked documents.

The group has studied the theory of data mining, e.g., by looking at the relationship of the logical complexity of the discovered sentences and the sample size needed for discovery, and by investigating various frameworks for data mining.

A growing research topic has been the use of Markov chain Monte Carlo methods in data analysis, in particular in the analysis of event data. We develop tools for the automatic analysis of complex statistical models (Bayesian or full probability models), and we model and analyze data with other scientists, e.g., in epidemiology, paleoecology, and archaeology.

The research is done in several projects funded by the Academy of Finland, TEKES, and the European ESPRIT Programme. The group has close cooperation with the Document Management and Machine Learning groups.

The members of the Data Mining group are Prof. Heikki Mannila (group leader), Dr. Helena Ahonen, M.Sc. Oskari Heinonen, Ykä Huhtala, M.Sc. Mika Klemettinen, Karri-Pekka Laakso, Tommi Mononen, M.Sc. Vesa Ollikainen, M.Sc. Pirjo Ronkainen, M.Sc., M.Th. Marko Salmenkivi, Jouni Seppänen, Dr. Hannu Toivonen, and Doc. Inkeri Verkamo.

Publications: [20-23, 91, 144, 150, 151, 161-172, 175, 176, 180-186, 207, 219, 234, 235, 238, 240-242, 244-246, 250, 253, 257, 258, 263, 264].

Home Page: http://www.cs.helsinki.fi/research/fdk/datamining/
next up previous contents
Next: Document Management (DocMan) Up: c) Information Systems Previous: c) Information Systems