|
Organizers |
Use of Partial Least Squares in Microarray Data
by
Susmita Datta
Georgia State University
Microarray technology has revolutionized the way gene functions are monitored. Analysis of microarray data is a fast growing research area that interfaces various disciplines such as biology, computer science and statistics. While various clustering and classification techniques have been successfully employed to group genes based on the similarity of their expression patterns, much is yet to be learnt about the interrelationship of the expression levels among various genes. We approach this problem with a statistical technique called partial least squares that is capable of modeling a large number of variables. We use it to analyze a publicly available microarray data on sporulation of budding yeast. We investigate a number of representative genes from each temporal group (based on the time of first induction) of positively expressed genes and show that in each case most of the variability was explained by only two partial regression terms based on all remaining genes. By comparing genes with largest coefficients corresponding to a set of representative genes, we were able to identify a handful of genes that seem to have a significant control over the expression levels in the temporal group. It is also shown that the coefficients of the partial least squares models can be used to construct a dissimilarity measure that can be used in a clustering algorithm to group genes.
Date received: November 11, 2001
Copyright © 2001 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # caid-68.