E-Mail Spam Filtering with Local SVM Classifiers

Blanzieri, Enrico and Bryl, Anton (2008) E-Mail Spam Filtering with Local SVM Classifiers. UNSPECIFIED. (Unpublished)

[img]
Preview
PDF
Download (718Kb) | Preview

    Abstract

    This paper describes an e-mail spam filter based on local SVM, namely on the SVM classifier trained only on a neighborhood of the message to be classified, and not on the whole training data available. Two problems are stated and solved. First, the selection of the right size of neighborhood is shown to be critical; our solution is based on the estimation of the a-posteriori probability of the correct decision, and the resulting algorithm is called highest probability SVM nearest neighbor (HP-SVM-NN). The second problem is the application of the algorithm in practice, and we propose a practical filter architecture based on HP-SVM-NN. Extensive testing is performed on SpamAssassin corpus and TREC 2005 Spam Track corpus, showing that HP-SVM-NN outperforms pure SVM and is applicable in practice. Finally, we explore the locality properties of the two corpora using Sammon’s projection.

    Item Type: Departmental Technical Report
    Department or Research center: Information Engineering and Computer Science
    Subjects: Q Science > QA Mathematics > QA076 Computer software
    Report Number: DISI-08-013
    Repository staff approval on: 26 Mar 2008

    Actions (login required)

    View Item