Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme
#1

A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme

[attachment=279]

Abstract

E-mail communication is indispensable nowadays, but the e-mail spam problem continues growing drastically. In recent
years, the notion of collaborative spam filtering with near-duplicate similarity matching scheme has been widely discussed. The primary
idea of the similarity matching scheme for spam detection is to maintain a known spam database, formed by user feedback, to block
subsequent near-duplicate spams.

INTRODUCTION

E-MAIL communication is prevalent and indispensable
nowadays. However, the threat of unsolicited junk emails,
also known as spams, becomes more and more
serious. According to a survey by the website TopTenREVIEWS
[11], 40 percent of e-mails were considered as spams
in 2006. The statistics collected by MessageLabs1 show that
recently the spam rate is over 70 percent and persistently
remains high. The primary challenge of spam detection
problem lies in the fact that spammers will always find new
ways to attack spam filters owing to the economic benefits of
sending spams. Note that existing filters generally perform
well when dealing with clumsy spams, which have
duplicate content with suspicious keywords or are sent
from an identical notorious server.

Definition of Near-Duplicate

The central idea of near-duplicate spam detection is to exploit
reported known spams to block subsequent ones which have
similar content. For different forms of e-mail representation,
the definitions of similarity between two e-mails are diverse.
Unlike most prior works representing e-mails based mainly
670 IEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 5, MAY 2011
on content text, we investigate representing each e-mail
using an HTML tag sequence.

E-MAIL ABSTRACTION SCHEME

In this section, a novel e-mail abstraction scheme is
introduced. In Section 3.1, procedure SAG is presented to
depict the generation process of an e-mail abstraction. The
devised data structures SpTable and SpTrees are illustrated
in Section 3.2. Finally, the robustness issue is discussed in
Section 3.3.

Structure Abstraction Generation

Wepropose the specific procedureSAGto generate the e-mail
abstraction using HTML content in e-mail. SAG is elaborated
with the example of Fig. 3, and the algorithmic form of SAG is
outlined in Fig. 1.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Powered By MyBB, © 2002-2024 iAndrew & Melroy van den Berg.