Publications of the Data Mining Group at the University of Helsinki


Here is a slightly commented list of some of the publications of the group in areas related to data mining.

NOTE! Unfortunately the list is NOT up-to-date. View the publications listed in the home pages of the members of the research group to see more up-to-date lists. This list and these pages will be updated ASAP. (mk 6/99)


Association Rules Sequence Data Theory
Surveys Sampling Machine Learning
Database Design and
Data Mining
Discovering Document
Structures
Data Mining
in Text


Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A fast algorithm for finding such rules is given in

The following paper shows how association rules can be found in only one database pass almost always, by using a random sample to bootstrap the discovery.

An experiment using a general purpose database management system to support the search for association rules is reported in

The methods above discover large collections of rules, and tools are needed to help in locating the interesting ones. The following two papers consider this problem.

The algorithms for finding association rules work by finding frequent sets of attributes. This approach has surprising uses also in finding other types of rules.


What do you do with large sequences of events? The following paper studies how to find sets of interconnected events from such sequences.

The two different approaches of the above paper have been first presented in

and in

The next paper describes a system built for the analysis of telecommunications alarm databases: A telecommunications view of this system is given in the following paper: A Bayesian tool for the problem of modelling dependencies between events is described in the following papers: One of the main goals of knowledge discovery is to produce useful and valuable information for the users. The following papers consider not only the utilization aspect but also the whole discovery process:


There are lots of ad hoc studies in data mining. Could one obtain some general results? A possible framework is given in

An early version of the paper appeared as


The PhD thesis of Hannu Toivonen is not actually a survey, but it covers the important area of the discovery of frequent patterns. Well-known examples of frequent patterns are, e.g., association rules and episodes. Aspects handled in the work include a generic algorithm for the task of discovering frequent patterns, analyses of such tasks, the use of sampling, and rules with negation and disjunction.


When there is a lot of data to analyze, sampling can ease the task. The following paper considers the relationship between the logical form of sentences and the sample size needed for reliable identification of the sentences.

A similar study in the context of functional dependencies is presented in

The following paper shows how association rules can be found in only one database pass almost always, by using a random sample to bootstrap the discovery.


The following papers are also more or less related to data mining / knowledge discovery, but they have a more classical machine learning orientation.

The paper

tries to find a simple nontrivial class of concepts for which one could say something definite about the approximations to the MDL principle.

Humans seem (at least sometimes) to use rules which have exceptions: if so-and-so, then thus, unless so-and-so2, in which case thus2, etc. Properties of such rule formalisms and how to learn such rules are studied in.


In database design, one can use data mining methods to look for integrity constraints in a database instance. See the following book for this, and some other issues.

A new, efficient method for discovering functional dependencies and approximate functional dependencies is described in the following paper. The scale-up properties of the algorithm are superior to previous algorithms.

One interesting new area are the so called inductive databases, where the database consists of data part and pattern part.

Some of the data mining issues in database design are considered in other papers by Heikki Mannila.



Recently, we have analysed document collections using data mining methods. This new field of application, text mining, is closely related to information retrieval and, in our case, text analysis in general.

Last update on May 6, 1998. [index]

This page is maintained by
Hannu.Toivonen@Helsinki.FI      Mika.Klemettinen@Helsinki.FI