Malossini, Andrea and Blanzieri, Enrico and Ng, Raymond (2004) Assessment of SVM Reliability for Microarray Data Analysis. UNSPECIFIED. (Unpublished)
The goal of our research is to provide techniques that can assess and validate the results of SVM-based analysis of microarray data. We present preliminary results of the effect of mislabeled training samples. We conducted several systematic experiments on artificial and real medical data using SVMs. We systematically flipped the labels of a fraction of the training data. We show that a relatively small number of mislabeled examples can dramatically decrease the performance as visualized on the ROC graphs. This phenomenon persists even if the dimensionality of the input space is drastically decreased, by using for example feature selection. Moreover we show that for SVM recursive feature elimination, even a small fraction of mislabeled samples can completely change the resulting set of genes. This work is an extended version of the previous paper [MBN04].
Actions (login required)