Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Efficient Techniques for Online Record Linkage
#1

Efficient Techniques for Online Record Linkage

[attachment=753]

Abstract

The need to consolidate the information contained in heterogeneous data sources has been widely documented in recent
years. In order to accomplish this goal, an organization must resolve several types of heterogeneity problems, especially the entity
heterogeneity problem that arises when the same real-world entity type is represented using different identifiers in different data
sources. Statistical record linkage techniques could be used for resolving this problem.

INTRODUCTION

THE last few decades have witnessed a tremendous
increase in the use of computerized databases for
supporting a variety of business decisions. The data needed
to support these decisions are often scattered in heterogeneous
distributed databases. In such cases, it maybe
necessary to page link records in multiple databases so that one
can consolidate and use the data pertaining to the same realworld
entity.

MOTIVATIONAL EXAMPLES

In order to motivate the problem context and illustrate the
usefulness of the sequential approaches presented in this
paper, we provide two real-world examples: the first one is
drawn from insurance claims processing, and the second
from crime investigation.
Example: Insurance Claims Processing
Consider the following situation in a large city with four
major health insurance companies, each with several
million subscribers. Each insurance company processes
more than 10,000 claims a day; manual handling of this
huge volume could take significant human effort resulting
in high personnel and error costs. A few years ago, the
health insurance companies and the medical providers in
the area agreed to automate the entire process of claims
filing, handling, payment, and notification. In the automated
process, a medical service provider files health
insurance claims electronically using information (about
patients and services provided) stored in the provider
database. A specialized computer program at the insurance
company then processes each claim, issues payments to
appropriate parties, and notifies the subscriber.

Sequential Record Linkage and Matching Tree

The sequential approach decides on the next best
attribute to acquire, based upon the comparison results of
the previously acquired attributes. The acquisition of
attributes can be expressed in the form of a matching tree
as shown in Fig. 1. This tree can be used in the following
manner: Starting at the root, we acquire attribute Y3 first.
Reply

#2
Abstract:
Sequential online record linkage is the task of quickly and accurately identifying heterogeneous records corresponding to the same entity from one or more data sources. It represents the similarity of real-world entity type and that is identified using various identifiers in various databases. It uses the matching technique for reduce the communication overhead. More specifically, the sequential approach is that, it doesn t consider all the attributes of remote records to local site; but performs single attribute function at an instance.

Aim & Objective
To combine and secure the information contained in heterogeneous data sources.
To develop a matching tree, similar to a decision tree, and use it.
Reduces the communication overhead significantly.
Identify the closeness between records using PBM (Probability Based Model).

Problem Statement
An important issue associated with record linkage in distributed environments is that of schema integration. For record linkage techniques to work well, one should be able to identify the common non-key attributes between two databases. In order to resolve this issue, it develops a matching tree, similar to a decision tree, and uses it to propose techniques that reduce the communication overhead significantly.

Contribution
Record linkage in general has been studied and a solid probabilistic decision framework has been proposed along with several extensions and specific estimation methods. The Probabilistic Based Model (PBM) used to identify the heterogeneous records accurately in various data sources. Online record linkage determines if pairs of data records describe the same entity. (I.e. find record pairs that are co-referent). The entities are mainly people or organizations or The main advantage of this method is joining the heterogeneous data sources and gets the certain record. And also it removes the duplicate data s in data sources.

Software Requirements
Windows / XP / Dotnet / SQL
(or)
Java / Servlet / JSP / JDBC
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Powered By MyBB, © 2002-2024 iAndrew & Melroy van den Berg.