Login

vineeth · 10-04-2017, 07:51 PM

Bringing Order to the Web

[attachment=18130]

Towards Semantic Web Mining
1.1 The Semantic Web
The increasing usage of the current World Wide Web leads to a new challenge of optimizing
the interchange of information, due to the fact that a huge amount of data is
interpretable by humans only. The Semantic Web deals with an idea of Tim Berners-Lee1
to enrich the Web by machine-understandable information which supports the user in
his tasks. Machine processable information for instance can lead a quite powerful search
engine to more relevant pages and can improve precision and recall.

1.2 Web Mining
The characteristic feature of Web Mining is the use of Data Mining techniques to elaborate
on content, structure and usage of Web resources. Web Mining is an invaluable help
in the transformation from human understandable content to machine understandable
semantics.

1.3 Extracting Semantics from the Web
The precondition for managing knowledge in an automatic way, instead of accessing unstructured
material, is to add semantic annotation to Web documents. All approaches
discussed here assist the knowledge engineer in extracting the semantics, but cannot
completely replace him. A computer can hardly be enabled to fully consider background
knowledge, experience or social conventions.

1.4 Exploiting Semantics for Web Mining
Semantics can be exploited for different purposes. The first major application area is the
explicit encoding of semantics for mining the Web content. In [BHS02] the input data
is preprocessed and ontology-based heuristics for feature selection and feature aggregation
are applied. Based on these representations multiple clustering results using the
K-Means algorithm are computed. These results can be explained by the corresponding
selection of concepts in the ontology.

Mining the Semantic Web
In the Semantic Web, content and structure are strongly interwined. Therefore the distinction
between structure and content mining vanishes. An important group of those
techniques is formed by Relational Data Mining. It comprises techniques for classification,
regression, clustering and association analysis to look for patterns that involve
multiple relations in a relational database. The algorithms can be transformed in order
to deal with RDF or ontology-based data. Mining the usage can be enhanced further, if
the semantics are contained explicitly in the pages by referring to concepts of ontologies

Structure of the Web
The structure of the Web is based on a graph with about 150 million nodes (Web pages)
and 1.7 billion edges (hyperlinks)4. If Web pages A and B page link to a page C, A and B are
called the backlinks of C. This circumstance is illustrated in Figure 2. In generall, highly
linked pages are more important. Thus they have more backlinks. But the important
backlinks are often less in quantity. For example a Web page with a single backlink from
Yahoo has to be ranked higher than a page with a couple of backlinks from unknown or
private sites. A Web page has a high rank, if the sum of the ranks of its backlinks is also
high.
..