10-04-2017, 07:52 PM
Efficient Techniques for Online Record Linkage
[attachment=753]
Abstract
The need to consolidate the information contained in heterogeneous data sources has been widely documented in recent
years. In order to accomplish this goal, an organization must resolve several types of heterogeneity problems, especially the entity
heterogeneity problem that arises when the same real-world entity type is represented using different identifiers in different data
sources. Statistical record linkage techniques could be used for resolving this problem.
INTRODUCTION
THE last few decades have witnessed a tremendous
increase in the use of computerized databases for
supporting a variety of business decisions. The data needed
to support these decisions are often scattered in heterogeneous
distributed databases. In such cases, it maybe
necessary to page link records in multiple databases so that one
can consolidate and use the data pertaining to the same realworld
entity.
MOTIVATIONAL EXAMPLES
In order to motivate the problem context and illustrate the
usefulness of the sequential approaches presented in this
paper, we provide two real-world examples: the first one is
drawn from insurance claims processing, and the second
from crime investigation.
Example: Insurance Claims Processing
Consider the following situation in a large city with four
major health insurance companies, each with several
million subscribers. Each insurance company processes
more than 10,000 claims a day; manual handling of this
huge volume could take significant human effort resulting
in high personnel and error costs. A few years ago, the
health insurance companies and the medical providers in
the area agreed to automate the entire process of claims
filing, handling, payment, and notification. In the automated
process, a medical service provider files health
insurance claims electronically using information (about
patients and services provided) stored in the provider
database. A specialized computer program at the insurance
company then processes each claim, issues payments to
appropriate parties, and notifies the subscriber.
Sequential Record Linkage and Matching Tree
The sequential approach decides on the next best
attribute to acquire, based upon the comparison results of
the previously acquired attributes. The acquisition of
attributes can be expressed in the form of a matching tree
as shown in Fig. 1. This tree can be used in the following
manner: Starting at the root, we acquire attribute Y3 first.