Atlas home || Conferences | Abstracts | about Atlas

Computational Techniques and Applications Conference and Workshops - CTAC99
September 20-24, 1999
The Australian National University
Canberra, ACT, Australia

Organizers
Mike Osborne, Bob Gingold, Steve Roberts, David Harrar II, Thanh Tran, Bob Anderssen, Henry Gardner, Markus Hegland, Lutz Grosz

View Abstracts
Conference Homepage

Efficient Techniques for Tree Structured Data Mining
by
Robert A. Pearson
UNSW

Regression and Classification trees can be used for Data Mining. For predictioned values which are continuous a boosted regression tree approach (Friedman 1999) can be applied.

This will produce a general function that maps the independent variables onto the observed one.

A number of possible techniques for discovering the tree are derived. A naive implimentation has a computation time that increases with at least the square of the numbers of examples at each node.

A more efficient technique is to sort the examples at the node. When the number of values of a variable is significantly less than the number of examples an efficient technique is to map the values to the integers. A version of the code that requires only a single pass at the node can then be designed and implimented.

If the values that are tested are only required at limited accuracy a version of a refinement search can be used.

This can be designed so that each single pass of the data at a node increases the number of accurate digits.

If there are a large number of examples a version that requires a single sort initially, and then retrieves the values in sorted order is possible. The current program of this requires a very large memory and three passes of the data at each the parent of the current node.

For small nodes the sort strategy may actually be more efficient. For boosted regression trees the single sort version has a particular advantage as no nodes are small.

All of these strategies are described in detail. The particular advantages and disadvantages of each strategy are discussed. Timing comparisons of different strategies applied to the same data are include examples of boosted regression trees and individual classification trees.

http://www.cs.adfa.edu.au/~rap/rulefinder

Date received: July 25, 1999


Copyright © 1999 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # cadk-46.