Tools and techniques for decision tree learning

Tapio Elomaa: Tools and techniques for decision tree learning. PhD Thesis, Report A-1996-2, Department of Computer Science, University of Helsinki, May 1996. 116+24 pages. <http://www.cs.helsinki.fi/TR/A-1996/2>

Full paper: gzip'ed Postscript file
Covers of full paper tar'ed file
Metadata: XML file

Abstract

Decision tree learning is an important field of machine learning. In this study we examine both formal and practical aspects of decision tree learning. We aim at answering to two important needs: The need for better motivated decision tree learners and an environment facilitating experimentation with inductive learning algorithms. As results we obtain new practical tools and useful techniques for decision tree learning.

First, we derive the practical decision tree learner Rank based on the Findmin protocol of Ehrenfeucht and Haussler. The motivation for the changes introduced to the method comes from empirical experience, but we prove the correctness of the modifications in the probably approximately correct learning framework. The algorithm is enhanced by extending it to operate in the multiclass situations, making it capable of working within the incremental setting, and providing noise tolerance into it. Together these modifications entail practicability through a formal development process, which constitutes an important technique for decision tree learner design.

The other tool that comes out of this work is TELA, a general testbed for all inductive learners using attribute representation of data, not only for decision tree learners. This system guides and assists its user in taking new algorithms to his disposal, operating them in an easy fashion, designing and executing useful tests with the algorithms, and in interpreting the outcome of the tests. We present the design rationale, current composition, and future development directions of TELA. Moreover, we reflect on the experiences that have been gathered in the initial usage of the system.

The tools that come about are evaluated and validated in empirical tests over many real-world application domains. Several successful inductive algorithms are contrasted with the Rank algorithm in experiments that are carried out using TELA. These experiments let us evaluate the success of the new decision tree learner with respect to its established equivalents and validate the utility of the developed testbed. The tests prove successful in both respects: Rank attains the same overall level of prediction accuracy as C4.5, which is generally considered to be one of the best empirical decision tree learners, and TELA eases the execution of the experiments substantially.

Index Terms

Categories and Subject Descriptors:
I.2.6 Learning [ARTIFICIAL INTELLIGENCE]: Concept learning, Induction, Knowledge acquisition
E.1 Data Structures [DATA]: Trees
F.2.2 Nonnumerical Algorithms and Problems [ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY] Pattern matching

General Terms: Learning, Algorithms

Additional Key Words and Phrases: machine learning, inductive concept learning, decision trees, computational learning theory, empirical experiments

Online Publications of Department of Computer Science, Anna Pienimäki

Department of Computer Science

Department information

Tools and techniques for decision tree learning

Abstract

Index Terms