|
Organizers |
Mixture model-based clustering of microarray expression data
by
Geoff McLachlan
Department of Mathematics, University of Queensland
Coauthors: David Peel (University of Queensland), Richard Bean (University of Queensland)
We consider the clustering of microarray gene expression data via a model-based approach using mixtures of log normal component distributions. Microarray data collected on N genes from M experiments can be represented by a N x M matrix, whose ith row contains the expression levels for the ith gene in the M tissue samples. A cluster analysis of these data may concern clustering the M tissue samples or clustering the N genes on the basis of the expression levels for the M tissues available for each gene. As the number of genes N is typically much larger than the number M of available tissues, the first clustering is a nonstandard problem in cluster analysis since the number of observations M to be clustered is quite small relative to their dimension N. We show how we can still use a normal mixture-based approach to clustering in this nonstandard case by adopting mixtures of factor analyzers. The use of these models is demonstrated by their application to two well known data sets in the microarray literature, the colon data of Alon et al. (1999) and the leukaemia data of Golub et al. (1999). In these two examples, we make use of the EMMIX-GENE software that we have developed specifically for mixture model-based clustering of gene expression data.
http://www.maths.uq.edu.au/~gjm
Date received: July 17, 2001
Copyright © 2001 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # cahg-05.