|
Organizers |
Automated Modelling of Density via Multiple Trees
by
Philip Bell
Australian Bureau of Statistics
In the usual "predictive data mining" problem the aim is to predict a response using a set of input variables. One technique for this situation is the multiple additive regression tree approach (MART) of Friedman (IMS 1999 Reitz Lecture). In this paper I look at estimating the joint density of a set of variables, so that any subset of the variables can be predicted conditional on the others. Some applications of this are imputation of missing data, editing of data and synthetic estimation. I propose a log-linear model for joint density, based on the sum of a fixed number of trees each with a fixed number of two-way splits. I present maximum likelihood and Bayesian approaches to fitting the model - these approaches are iterative with one tree modified at each iteration. A novel feature is the use of a sample from the model distribution for the evaluation of likelihood. Markov Chain Monte Carlo techniques are used to maintain this model sample.
Date received: September 13, 2001
Copyright © 2001 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # cagd-84.