|
Organizers |
Adventures in Multicollinearity and Variable Selection
by
Leonardo E. Auslender
SAS Institute, Research and Development
Practitioners of data mining model searches often encounter that multicollinearity among predictors affects their findings. It is well known that the empirical estimation of a linear model, be it linear or logistic regression, may suffer from multicollinearity. In the context of model building, it is not very well known how to integrate the avoidance of multicollinearity effects into the process of variable selection. In the context of data mining, and given some practitioners’ claims of predictive power of a model as its most important goal regardless of interpretability, multicollinearity is deemed a minor nuisance rather than a problem. In this paper, we describe multicollinearity analytically, whether it is a minor nuisance or can be a major hurdle, and the tools proposed to remedy or prevent it. We analyze whether the present stepwise family and Foster/Stine search (JASA, 2004, June) allow multicollinearity to raise its ugly head. If so, we present a modification of the stepwise family search to verify whether the proposed remedies ameliorate or eliminate this effect.
Date received: August 22, 2005
Copyright © 2005 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # caqt-52.