The Magic of Data Mining- A Conceptual Study

DATA
& INFORMATION
- Data are Raw Facts. The word raw indicates that the facts have not yet been processed to reveal their meaning.
- Information is the result of processing raw data to reveal its meaning. Data Processing can be as simple as organizing data to REVEAL PATTERNS or as complex as making FORECASTS or DRAWING INFERENCES using Statistical Modelling.
- Knowledge represents highly specialized information which help an organization to take Strategic Decisions.
- Database System consists of logically related data stored in a single logical Data Repository.
EXTRACTING 'KNOWLEDGE' FROM
'DATA'
To put data mining in perspective, look at the pyramid in Figure below, which represents how knowledge is extracted from data.
(I) Data forms the pyramid base and represent what most organizations collect in their operational databases
(II) The second
level contains Information that represents
the purified and processed data. Information forms the basis for
decision making and business understanding.
(III) Knowledge is found at the pyramid’s apex and
represents highly specialized information.
DATA:
AN ORGANIZATIONAL ASSET
CONCEPT
OF DATA MINING
A typical Data Analysis Tool relies on the end users to define the problem, select the data, and initiate the appropriate data analyses to generate the information that helps model and solve problems that the end users uncover.
In other words, the End User REACTS to an external stimulus—the discovery of the problem itself. If the End User fails to detect a problem, no action is taken.
Given that limitation, some current Business Intelligence (BI) environments now support various types of Automated Alerts.
The ALERTS are Software Agents that constantly monitor certain parameters, such as sales indicators and inventory levels, and then perform specified actions (send e-mail or alert messages, run programs, and so on) when such parameters reach predefined levels.
In contrast to the traditional (reactive) BI tools, Data Mining is PROACTIVE. Instead of having the End User define the problem, select the data, and select the tools to analyze the data, Data-mining Tools automatically search the data for anomalies and possible relationships, thereby identifying problems that have not yet been identified by the end user.
In other words, Data Mining refers to the activities that analyze the data, uncover problems or opportunities "HIDDEN" in data relationships, form computer models based on their findings, and then use the models to predict Business Behaviour- requiring minimum end-user intervention.
DATA
MINING AS A PART OF THE KNOWLEDGE DISCOVERY PROCESS
The Knowledge
Discovery Process comprises SIX
PHASES:
- Data Selection
- Data Cleansing
- Enrichment (or Integration)
- Data Transformation or Encoding
- Data Mining
- The Reporting & Display of the Discovered Information.

GOALS
OF DATA MINING AND KNOWLEDGE DISCOVERY
- PREDICTION-Data mining can show how certain attributes within the data will behave in the future.
- IDENTIFICATION-Data Patterns can be used to identify the existence of an item, an event, or an activity.
- CLASSIFICATION-Data Mining can partition the data so that different classes or categories can be identified based on combinations of parameters.
- OPTIMIZATION-One eventual goal of Data Mining may be to optimize the use of limited resources such as time, space, money, or materials and to maximize output variables such as sales or profits under a given set of constraints.
PHASES OF DATA MINING
In spite of the lack of precise standards, Data Mining is subject to four general phases:
PH.I-DATA PREPARATION PHASE
In the Data Preparation Phase, the main data sets to be used by the data-mining operation are identified and cleansed of any data impurities.
Because
the data in the data warehouse are already integrated and filtered, the data
warehouse usually is the target set for data-mining operations.
PH.II-DATA ANALYSIS & CLASSIFICATION PHASE
The Data
Analysis and Classification Phase studies the data to identify
common data characteristics or patterns. During this phase, the Data-mining
Tool applies specific algorithms to find:
- Data Groupings, Classifications, Clusters, or Sequences.
- Data Dependencies, Links, or Relationships.
- Data Patterns, Trends, and Deviations.
PH.III - KNOWLEDGE ACQUISITION PHASE
The Knowledge
Acquisition Phase uses the results of the Data Analysis
And Classification Phase. During the Knowledge Acquisition
Phase, the Data-mining Tool (with possible intervention by
the end user) selects the appropriate Modelling or Knowledge
Acquisition Algorithms such as Neural Networks, Decision
Trees, Rules Induction etc.
PH.IV – PROGNOSIS PHASE
Although
many data-mining tools stop at the knowledge-acquisition phase, others continue
to the Prognosis Phase.
In
that phase, the data-mining findings are used to predict future behaviour and
forecast business outcomes.
Examples of data-mining findings can
be:
- Sixty-five percent of customers who did not use a particular credit card in the last six months are 88 percent likely to cancel that account.
- Eighty-two percent of customers who bought a 42-inch or larger LCD TV are 90 percent likely to buy an entertainment center within the next four weeks.
KNOWLEDGE
DISCOVERED DURING DATA MINING
These can be described in FIVE Ways:
(I) Association Rules-These rules correlate the presence of a set of items with another range of values for another set of variables.
Example:
When a female retail shopper buys a handbag, she is likely to buy shoes.
(II) Classification Hierarchies-The goal is to work from an existing set of events or transactions to create a hierarchy of classes.
Example:
A population may be divided into five ranges of credit worthiness based on a
history of previous credit transactions.
(III) Sequential Patterns-A sequence of actions or events is sought.
Example: If a patient underwent cardiac bypass surgery for blocked arteries and an aneurysm and later developed high blood urea within a year of surgery, he or she is likely to suffer from kidney failure within the next 18 months.
(IV) Patterns within Time Series-Similarities can be detected within positions of a time series of data, which is a sequence of data taken at regular intervals such as daily sales or daily closing stock prices.
Example: Two products
show the same selling pattern in summer but a different one in winter.
(V) Clustering-A given population of events or items can be partitioned (segmented) into sets of "similar" element.
Example: An entire population of treatment data on a disease may be divided into groups based on the similarity of side effects produced.
Closing Comments:
Some of the data-mining
findings might fall outside the boundaries of what business managers
expect. For Example, a data-mining tool might find a
close relationship between a customer’s favourite brand of soda and the brand
of tires on the customer’s car. Clearly, that relationship might not be held in
high regard among sales managers. (In Regression Analysis, those
relationships are commonly described by the label “Idiot Correlation.”)
Fortunately, data mining
usually yields more meaningful result. In fact, data mining has proved to
be very helpful in finding practical relationships among data that
help define customer buying patterns, improve product development and
acceptance, reduce healthcare fraud, analyze stock markets, and so on.
Written by-
Sameer
***********************
Comments
Post a Comment
Please let me know if you have any queries, doubts etc. in your mind.