Login

sameee · 10-04-2017, 08:00 PM

A NOVEL REPLICA DETECTION SYSTEM USING
BINARY CLASSIFIERS, R-TREES, AND PCA

[attachment=475]

ABSTRACT

Replica detection is a prerequisite for the discovery of copyright infringement
and detection of illicit content. For this purpose, contentbased
systems can be an efficient alternative to watermarking. Rather
than imperceptibly embedding a signal, content-based systems rely
on image similarity. Certain content-based systems use adaptive
classifiers to detect replicas. In such systems, a suspect image is
tested against every original, which can become computationally
prohibitive as the number of original images grows.

INTRODUCTION

The recent progress in multimedia technologies and the advent of the
WorldWideWeb (Web) have permitted to process and distribute digital
content at negligible costs. Unfortunately, many valuable digital
images are now illegally redistributed. In this context, both content
protection and detection of copyright infringements becomes important.
In this paper, we propose a system to detect image replicas. By
the term replica, we refer not only to a bit exact copy of a given original
image, but also to modified versions of the image after certain
manipulations, malicious or not, as long as these manipulations do
not change the perceptual meaning of the image content. In particular,
replicas include all variants of the original image obtained after
common image processing manipulations such as compression, filtering,
adjustments of contrast, or geometric manipulations.

REPLICA DETECTION SYSTEM
The main idea behind the proposed replica detection system is to use
a binary classifier to determine whether the suspect image is a replica
of an image contained in a database of originals. Although the number
of originals is quite small compared to that of all images on the
Web, it can still be fairly large depending on the application (for example
in the thousands or even millions). When using a set of binary
classifiers, each being able to detect whether a suspect image is a
replica of a specific image in the database, the entire database has to
be sequentially scanned, which becomes quickly cumbersome as the
number of originals grows. Therefore, we propose to use a preprocessing
step based on an indexing structure where, given a suspect
image, the most likely original images are efficiently selected. We
denote the set of likely originals, or candidates, C. Ideally, C contains
few elements and, includes the correct original if the suspect
image is indeed a replica of one of the images in the database.

R-tree Performance

The R-tree performance is assessed by measuring the miss-rate (i.e.
the average probability that the R-tree does not return among its results
the corresponding original when the test image is a replica) and
the average number of returned candidates. For this purpose, the
subsets Q and S (of Q and S respectively), that do not include the
non-replica images, are used. Note that in general, the average number
of returned candidates for non-replica images is one less than for
replica images.

CONCLUSION

In this work, a replica detection system capable of retrieving from a
database of originals the one that corresponds to a given suspect image
was presented. Since binary classifiers are used by the system,
the suspect image has to be tested against every original contained in
the database.