Login

o_mars_2010 · 10-04-2017, 08:16 PM

Prepared by:RAJESH T.M

[attachment=7548]

Abstract
Data mining is a technology used in different disciplines to search for significant relationships among variables or components in large data sets. Data mining enables organizations to use their current reporting capabilities to uncover and understand hidden patterns in vast databases. Data mining is mainly used in applications like Customer Relationship Management, Healthcare, Space Applications, Bioinformatics & etc., In this study, we have concentrated on the application of data mining in an education environment. In this paper, we have explored the case study on Higher Education conducted by Jing Luan, Chief Planning and Research Officer, Carbrillo College Founder, Knowledge Discovery Laboratories-SPS. We have also conducted Literature Review on various techniques of Data Mining and suggested best approach for implementing Data Mining in Academic Portfolio.

1. Introduction
Data mining uses a combination of an explicit knowledge base, sophisticated analytical skills, and domain knowledge to uncover hidden trends and patterns. These trends and patterns form the basis of predictive models that enable analysts to produce new observations from existing data. Gartner Inc. s definition of data mining is the most comprehensive: The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in repositories, and by using pattern recognition technologies, as well as statistical and mathematical techniques. Data mining should be performed on very large or raw datasets using either supervised or unsupervised data mining algorithms.
Data Mining is extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases.
Data mining is a multi-disciplinary research and application area that aims to discover novel and useful knowledge from vast databases, using methods ranging from artificial intelligence, statistics and databases.

1.1 Data Mining
Data Mining is Extracting or mining knowledge from large amounts of data. It is a Data-driven oriented discovery and modeling of hidden patterns (we never knew existed) in large volumes of data. It is also as extraction of implicit, previously unknown and unexpected, potentially, extremely useful information from data
2. Literature Review
According to Laura Squier, Data Mining is a hot buzzword for a class of techniques that find patterns in data. Data Mining is a user-centric, interactive process which leverages analysis technologies and computing power. It also comprises a group of techniques that find relationships that have not previously been discovered and not reliant on an existing database. It is a relatively easy task that requires knowledge of the business problem/subject matter expertise

2.1 Data Mining Techniques
2.1.1 Cluster analysis
Cluster Analysis[4] is one of the techniques used in data mining. Cluster analysis involves the process of grouping objects with similar characteristics, and each group is referred to as a cluster. Cluster analysis is used in various fields, such as marketing, image processing, geographical information systems, biology, and genetics.
Cluster analysis[4] is a multivariate analysis technique where individuals with similar characteristics are determined and classified (grouped) accordingly. Through cluster analysis, dense and sparse region can be determined in the distribution, and different distribution patterns may be achieved.
2.1.2 Decision Tree
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, like chance event outcomes, resource costs, and utility. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. Many data mining algorithms and tools stop at discovered customer models, producing distribution information on customer profiles. Decision tree algorithms, when applied to industrial problems such as customer relationship management (CRM), are useful in Pointing out customers who are likely attritions and customers who are loyal, but they require human experts to post process the discovered knowledge manually. A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. Another use of decision trees is as a descriptive means for calculating conditional probabilities. [6][7]
In decision analysis, a "decision tree" and the closely-related influence diagram is used as a visual and analytical decision support tool, where the expected values (or expected utility) of competing alternatives are calculated.
A decision Tree consists of 3 types of nodes:
i. Decision nodes - commonly represented by squares
ii. Chance nodes - represented by circles
ii. End nodes - represented by triangles