The Fight against Spam - A Machine Learning Approach

The Fight against Spam - A Machine Learning Approach

The paper presents a brief survey of the fight between spammers and antispam software developers, and also describes new approaches to spam filtering. In the first two sections we present a survey of the currently existing spam types. Some well-mapped spammer tricks are also described, although the imagination of spam distributors is endless, and therefore only the most common tricks are covered. We present some up-to-date spam blocking techniques currently integrated into today’s spam filters. In the Methodology and Results sections we describe our implementation of Itemsets-based, Naïve Bayes and LSI classifiers for classifying email messages into spam and non-spam (ham) categories.

Keywords: spam, ham, unsolicited mail, e-mail, spam filter, antispam, whitelist, graylist, blacklist, machine learning, naive Bayes, itemsets, LSI, latent semantic indexing, heuristics, classification

Year: 2007

Download: download Full text [204 kB]

Authors of this publication:

Karel Ježek

Phone:  +420 377632475

Karel is the former group coordinator and a supervisor of PhD students working at research projects of this Group.

Jiří Hynek

Phone: +420 603492837

Jiri, a co-founder of the Text-Mining Research Group, works as a lecturer at the Dept. of Computer Science and Engineering. His research interests include machine learning and language-related problems. Jiri’s teaching activity is focused on good writing style and technical writing in general.

Related Projects:


Document Classification

Authors:  Jiří Hynek, Karel Ježek, Michal Toman, Roman Tesař, Zdeněk Češka, Petr Grolmus
Desc.:Use of inductive machine learning methods in classification of short text documents.