Viimeksi päivitetty 15.1.1999

Helsingin yliopisto
Rolf Nevanlinna -instituutti ja tietojenkäsittelytieteen laitos

Guest lectures


Luc Dehaspe

Department of Computer Science, K.U.Leuven, Belgium

Frequent Pattern Discovery and Decision Tree Induction in First-Order Logic

Date Wednesday, 20 January, 1999
Place Department of Computer Science, Lecture hall A414 (Teollisuuskatu 23, 4th floor)
Time 16:45 - 18:15 (note the hours!)

Hands-on exercises with Warmr and Tilde

Date Thursday, 21 January, 1999
Place Department of Computer Science, Room D326 (Teollisuuskatu 23, 3rd floor)
Time 16 - 18


Abstracts

Frequent Pattern Discovery and Decision Tree Induction in First-Order Logic

Wed 20 Jan at 4:45 pm, A414

I will discuss a general formulation of two central data mining tasks, where both the database and the patterns are represented in some subset of first-order logic. These tasks are frequent pattern discovery and decision tree induction.

In recent years, the usage and size of databases have grown dramatically, due to a constant decrease in the cost of both the collection and the storage of huge amounts of data. The need for tools to exploit the popular Data Warehouse has grown accordingly and has given rise to a rapidly evolving research field at the intersection of statistics, databases and machine learning: Data Mining and Knowledge Discovery in Databases (KDD).

Within KDD, the discovery of frequent patterns has been studied in a variety of settings. In its simplest form, known from association rule mining, the task is to discover all frequent item sets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered with special purpose algorithms. A unified representation in first-order logic gives insight to the blurred picture of the frequent pattern discovery domain. Within the first-order logic formulation a number of dimensions appear that relink diverged settings. I will present the Warmr algorithm for frequent pattern discovery in first-order logic that is well-suited for exploratory data mining: it offers the flexibility required to experiment with standard and --in particular-- novel settings not supported by special purpose algorithms.

Warmr upgrades the well-known Apriori algorithm to first-order logic. At the K.U.Leuven a number of similar upgrades have been realized. As a second example of the ``Leuven strategy'', I will present the Tilde system, which is an adaptation of C4.5 and induces first-order logical decision trees.

To conclude, I will demonstrate the scientific and commercial potential of our approach via an application in chemical toxicology, where the task is to identify cancer-causing chemical substances.

Hands-on exercises with Warmr and Tilde

Thu 21 Jan at 4 pm, D324

This session is meant to be a first introduction to the practice of data mining in first-order logic. Participants will be guided through the different steps involved in setting up an application with Warmr and Tilde, two tools developed at the K.U.Leuven for frequent pattern discovery and decision tree induction in first-order logic. No knowledge of logic (Prolog) will be assumed.


Welcome!

Hannu Toivonen