Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
AUTOR(ES)
Tan, Yongxi
FONTE
Oxford University Press
RESUMO
DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.)
ACESSO AO ARTIGO
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=546133Documentos Relacionados
- Multiclass classification of microarray data with repeated measurements: application to cancer
- Multiclass cancer diagnosis using tumor gene expression signatures
- FUZZY RULES EXTRACTION FROM SUPPORT VECTOR MACHINES (SVM) FOR MULTI-CLASS CLASSIFICATION
- Recursive partitioning for tumor classification with gene expression microarray data
- Knowledge-based analysis of microarray gene expression data by using support vector machines