|
Organizers |
Wavelets and the Curse of Dimensionality in Data Mining
by
Ole Nielsen
Computer Sciences Lab, RSISE, Australian National University
Coauthors: Markus Hegland (Australian National University), Zuowei Shen (National University of Singapore), Stephen Roberts (Australian National University)
A fundamental issue in Data Mining is the development of algorithms to extract some useful information from very large databases. One important technique is to estimate a smooth surface approximating the data. However, the number of observations can be of the order of millions and there may be thousands of variables recorded so one has to deal with the socalled "curse of dimensionality". To overcome this curse, additive models have been used, but - while they do reduce the effective dimensionality - their approximation properties are often inadequate.
We propose a method for approximating a high dimensional surface by computing a number of simpler surfaces of the same dimension, but on grids where the total number of gridpoints is equal to the resolution in one dimension only. These surfaces are then used to estimate wavelet coefficients of the full surface from which the approximation are computed.
We demonstrate that the algorithmic complexity is essentially proportional to the total number of desired gridpoints and that good approximation properties are maintained - even for less smooth data.
Date received: July 22, 1999
Copyright © 1999 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # cadk-37.