The Magic of Data Mining- A Conceptual Study

Top Data Mining Tools of 2020 | Free Data Mining Tools


DATA & INFORMATION

  • Data are Raw Facts. The word raw indicates that the facts have not yet been processed to reveal their meaning.
  • Information is the result of processing raw data to reveal its meaningData Processing can be as simple as organizing data to REVEAL PATTERNS or as complex as making FORECASTS or DRAWING INFERENCES using Statistical Modelling.
  • Knowledge represents highly specialized information which help an organization to take Strategic Decisions.
  • Database System consists of logically related data stored in a single logical Data Repository.


EXTRACTING 'KNOWLEDGE' FROM 'DATA'

 To put data mining in perspective, look at the pyramid in Figure below, which represents how knowledge is extracted from data.

(I) Data forms the pyramid base and represent what most organizations collect in their operational databases

(II) The second level contains Information that represents the purified and processed data. Information forms the basis for decision making and business understanding.

(III) Knowledge is found at the pyramid’s apex and represents highly specialized information.

                                    



DATA: AN ORGANIZATIONAL ASSET


Data are a valuable resource that can translate into information. If the information is accurate and timely, it is likely to trigger actions that enhance the organization’s competitive position and generate wealth.


In effect, an organization is subject to a DATA-INFORMATION-DECISION CYCLE; that is, the DATA USER applies INTELLIGENCE to DATA to produce INFORMATION that is the basis of KNOWLEDGE used in DECISION MAKING by the user. This CYCLE is illustrated in Fig. below:

 

CONCEPT OF DATA MINING

The purpose of Data Analysis is to discover previously unknown data characteristics, relationships, dependencies, or trends. Such discoveries then become part of the information framework on which decisions are built.

A typical Data Analysis Tool relies on the end users to define the problemselect the data, and initiate the appropriate data analyses to generate the information that helps model and solve problems that the end users uncover.

In other words, the End User REACTS to an external stimulus—the discovery of the problem itself. If the End User fails to detect a problem, no action is taken.

Given that limitation, some current Business Intelligence (BI) environments now support various types of Automated Alerts.

The ALERTS are Software Agents that constantly monitor certain parameters, such as sales indicators and inventory levels, and then perform specified actions (send e-mail or alert messages, run programs, and so on) when such parameters reach predefined levels.

In contrast to the traditional (reactive) BI tools, Data Mining is PROACTIVE. Instead of having the End User define the problemselect the data, and select the tools to analyze the dataData-mining Tools automatically search the data for anomalies and possible relationships, thereby identifying problems that have not yet been identified by the end user.

In other words, Data Mining refers to the activities that analyze the data, uncover problems or opportunities "HIDDEN" in data relationships, form computer models based on their findings, and then use the models to predict Business Behaviour- requiring minimum end-user intervention.


DATA MINING AS A PART OF THE KNOWLEDGE DISCOVERY PROCESS

Knowledge Discovery in Databases frequently abbreviated as KDD, typically encompasses more than Data Mining.

The Knowledge Discovery Process comprises SIX PHASES:

  1. Data Selection
  2. Data Cleansing
  3. Enrichment (or Integration)
  4. Data Transformation or Encoding
  5. Data Mining
  6. The Reporting & Display of the Discovered Information.

Stages of Knowledge Discovery in Databases Process. Source ...


GOALS OF DATA MINING AND KNOWLEDGE DISCOVERY

Data Mining is typically carried out with some end goals or applications. Broadly speaking, these goals fall into the following classes: Prediction, Identification, Classification, &  Optimization

  1. PREDICTION-Data mining can show how certain attributes within the data will behave in the future.
  2. IDENTIFICATION-Data Patterns can be used to identify the existence of an item, an event, or an activity.
  3. CLASSIFICATION-Data Mining can partition the data so that different classes or categories can be identified based on combinations of parameters.
  4. OPTIMIZATION-One eventual goal of Data Mining may be to optimize the use of limited resources such as time, space, money, or materials and to maximize output variables such as sales or profits under a given set of constraints. 

 

PHASES OF DATA MINING

In spite of the lack of precise standards, Data Mining is subject to four general phases:


PH.I-DATA PREPARATION PHASE

In the Data Preparation Phase, the main data sets to be used by the data-mining operation are identified and cleansed of any data impurities.

Because the data in the data warehouse are already integrated and filtered, the data warehouse usually is the target set for data-mining operations.


PH.II-DATA ANALYSIS & CLASSIFICATION PHASE

The Data Analysis and Classification Phase studies the data to identify common data characteristics or patterns. During this phase, the Data-mining Tool applies specific algorithms to find:

  • Data Groupings, Classifications, Clusters, or Sequences.
  • Data Dependencies, Links, or Relationships.
  • Data Patterns, Trends, and Deviations.

 

PH.III - KNOWLEDGE ACQUISITION PHASE

The Knowledge Acquisition Phase uses the results of the Data Analysis And Classification Phase. During the Knowledge Acquisition Phase, the Data-mining Tool (with possible intervention by the end user) selects the appropriate Modelling or Knowledge Acquisition Algorithms such as Neural Networks, Decision Trees, Rules Induction etc.

 

PH.IV – PROGNOSIS PHASE

Although many data-mining tools stop at the knowledge-acquisition phase, others continue to the Prognosis Phase.

In that phase, the data-mining findings are used to predict future behaviour and forecast business outcomes.

Examples of data-mining findings can be:

  1. Sixty-five percent of customers who did not use a particular credit card in the last six months are 88 percent likely to cancel that account.
  2.  Eighty-two percent of customers who bought a 42-inch or larger LCD TV are 90 percent likely to buy an entertainment center within the next four weeks.


KNOWLEDGE DISCOVERED DURING DATA MINING

These can be described in FIVE Ways:

(I) Association Rules-These rules correlate the presence of a set of items with another range of values for another set of variables.

Example: When a female retail shopper buys a handbag, she is likely to buy shoes.

 

(II) Classification Hierarchies-The goal is to work from an existing set of events or transactions to create a hierarchy of classes. 

Example: A population may be divided into five ranges of credit worthiness based on a history of previous credit transactions.


(III) Sequential Patterns-A sequence of actions or events is sought.

Example: If a patient underwent cardiac bypass surgery for blocked arteries and an aneurysm and later developed high blood urea within a year of surgery, he or she is likely to suffer from kidney failure within the next 18 months.


(IV) Patterns within Time Series-Similarities can be detected within positions of a time series of data, which is a sequence of data taken at regular intervals such as daily sales or daily closing stock prices.

Example: Two products show the same selling pattern in summer but a different one in winter.


(V) Clustering-A given population of events or items can be partitioned (segmented) into sets of "similar" element. 

Example: An entire population of treatment data on a disease may be divided into groups based on the similarity of side effects produced.


Closing Comments: 

Some of the data-mining findings might fall outside the boundaries of what business managers expect. For Example, a data-mining tool might find a close relationship between a customer’s favourite brand of soda and the brand of tires on the customer’s car. Clearly, that relationship might not be held in high regard among sales managers. (In Regression Analysis, those relationships are commonly described by the label “Idiot Correlation.”)

Fortunately, data mining usually yields more meaningful result. In fact, data mining has proved to be very helpful in finding practical relationships  among data that help define customer buying patterns, improve product development and acceptance, reduce healthcare fraud, analyze stock markets, and so on.


                        Written by- Sameer

***********************


Comments