Blanzieri, Enrico and Bryl, Anton (2008) E-Mail Spam Filtering with Local SVM Classifiers. UNSPECIFIED. (Unpublished)
This paper describes an e-mail spam filter based on local SVM, namely on the SVM classifier trained only on a neighborhood of the message to be classified, and not on the whole training data available. Two problems are stated and solved. First, the selection of the right size of neighborhood is shown to be critical; our solution is based on the estimation of the a-posteriori probability of the correct decision, and the resulting algorithm is called highest probability SVM nearest neighbor (HP-SVM-NN). The second problem is the application of the algorithm in practice, and we propose a practical filter architecture based on HP-SVM-NN. Extensive testing is performed on SpamAssassin corpus and TREC 2005 Spam Track corpus, showing that HP-SVM-NN outperforms pure SVM and is applicable in practice. Finally, we explore the locality properties of the two corpora using Sammon’s projection.
Actions (login required)