Atlas home || Conferences | Abstracts | about Atlas

FIMXII-SCMA2005@AUBURN, Twelfth Annual International Conference on Statistics, Combinatorics, Mathematics and Applications
December 2-4, 2005
Auburn University
Auburn, Alabama, USA

Organizers
Forum for Interdisciplinary Mathematics

View Abstracts
Conference Homepage

Feature selection and machine learning with mass spectrometry data for distinguishing cancer and non-cancer samples
by
Susmita Datta
Department of Bioinformatics and Biostatistics, University of Louisville
Coauthors: Lara M. DePadilla

In this talk, I present a comparative study of various clustering and classification algorithms as applied to differentiate cancer and non-cancer protein samples using mass spectrometry data. Our study demonstrates the usefulness of a feature selection step prior to applying a machine learning tool. A natural and common choice of a feature selection tool is the collection of marginal p-values obtained from t-tests for testing the intensity differences at each m/z ratio in the cancer versus non-cancer samples. We study the effect of selecting a cutoff in terms of the overall Type 1 error rate control on the performance of the clustering and classification algorithms using the significant features. For the classification problem, we also considered m/z selection using the importance measures computed by the Random Forest algorithm of Breiman. Using a data set of proteomic analysis of serum from ovarian cancer patients and serum from cancer-free individuals in the Food and Drug Administration National Cancer Institute Clinical Proteomics Database, we undertake a comparative study of the net effect of the machine learning algorithm - feature selection tool - cutoff criteria combination on the performance as measured by an appropriate error rate measure.

Keywords: Mass spectrometry, high throughput data, clustering, classification, machine learning, microarray.

PDF

Date received: September 30, 2005


Copyright © 2005 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Conferences Inc. Document # carm-06.