May 24, 2020

The Magic of Data Mining- A Conceptual Study

Top Data Mining Tools of 2020 | Free Data Mining Tools

DATA & INFORMATION

Data are Raw Facts. The word raw indicates that the facts have not yet been processed to reveal their meaning.
Information is the result of processing raw data to reveal its meaning. Data Processing can be as simple as organizing data to REVEAL PATTERNS or as complex as making FORECASTS or DRAWING INFERENCES using Statistical Modelling.
Knowledge represents highly specialized information which help an organization to take Strategic Decisions.
Database System consists of logically related data stored in a single logical Data Repository.

EXTRACTING 'KNOWLEDGE' FROM 'DATA'

To put data mining in perspective, look at the pyramid in Figure below, which represents how knowledge is extracted from data.

(I) Data forms the pyramid base and represent what most organizations collect in their operational databases

(II) The second level contains Information that represents the purified and processed data. Information forms the basis for decision making and business understanding.

(III) Knowledge is found at the pyramid’s apex and represents highly specialized information.

DATA: AN ORGANIZATIONAL ASSET

Data are a valuable resource that can translate into information. If the information is accurate and timely, it is likely to trigger actions that enhance the organization’s competitive position and generate wealth.

In effect, an organization is subject to a DATA-INFORMATION-DECISION CYCLE; that is, the DATA USER applies INTELLIGENCE to DATA to produce INFORMATION that is the basis of KNOWLEDGE used in DECISION MAKING by the user. This CYCLE is illustrated in Fig. below:

CONCEPT OF DATA MINING

The purpose of Data Analysis is to discover previously unknown data characteristics, relationships, dependencies, or trends. Such discoveries then become part of the information framework on which decisions are built.

A typical Data Analysis Tool relies on the end users to define the problem, select the data, and initiate the appropriate data analyses to generate the information that helps model and solve problems that the end users uncover.

In other words, the End User REACTS to an external stimulus—the discovery of the problem itself. If the End User fails to detect a problem, no action is taken.

Given that limitation, some current Business Intelligence (BI) environments now support various types of Automated Alerts.

The ALERTS are Software Agents that constantly monitor certain parameters, such as sales indicators and inventory levels, and then perform specified actions (send e-mail or alert messages, run programs, and so on) when such parameters reach predefined levels.

In contrast to the traditional (reactive) BI tools, Data Mining is PROACTIVE. Instead of having the End User define the problem, select the data, and select the tools to analyze the data, Data-mining Tools automatically search the data for anomalies and possible relationships, thereby identifying problems that have not yet been identified by the end user.

In other words, Data Mining refers to the activities that analyze the data, uncover problems or opportunities "HIDDEN" in data relationships, form computer models based on their findings, and then use the models to predict Business Behaviour- requiring minimum end-user intervention.

DATA MINING AS A PART OF THE KNOWLEDGE DISCOVERY PROCESS

Knowledge Discovery in Databases frequently abbreviated as KDD, typically encompasses more than Data Mining.

The Knowledge Discovery Process comprises SIX PHASES:

Data Selection
Data Cleansing
Enrichment (or Integration)
Data Transformation or Encoding
Data Mining
The Reporting & Display of the Discovered Information.

GOALS OF DATA MINING AND KNOWLEDGE DISCOVERY

Data Mining is typically carried out with some end goals or applications. Broadly speaking, these goals fall into the following classes: Prediction, Identification, Classification, & Optimization

PREDICTION-Data mining can show how certain attributes within the data will behave in the future.
IDENTIFICATION-Data Patterns can be used to identify the existence of an item, an event, or an activity.
CLASSIFICATION-Data Mining can partition the data so that different classes or categories can be identified based on combinations of parameters.
OPTIMIZATION-One eventual goal of Data Mining may be to optimize the use of limited resources such as time, space, money, or materials and to maximize output variables such as sales or profits under a given set of constraints.

PHASES OF DATA MINING

In spite of the lack of precise standards, Data Mining is subject to four general phases:

PH.I-DATA PREPARATION PHASE

In the Data Preparation Phase, the main data sets to be used by the data-mining operation are identified and cleansed of any data impurities.

Because the data in the data warehouse are already integrated and filtered, the data warehouse usually is the target set for data-mining operations.

PH.II-DATA ANALYSIS & CLASSIFICATION PHASE

The Data Analysis and Classification Phase studies the data to identify common data characteristics or patterns. During this phase, the Data-mining Tool applies specific algorithms to find:

Data Groupings, Classifications, Clusters, or Sequences.
Data Dependencies, Links, or Relationships.
Data Patterns, Trends, and Deviations.

PH.III - KNOWLEDGE ACQUISITION PHASE

The Knowledge Acquisition Phase uses the results of the Data Analysis And Classification Phase. During the Knowledge Acquisition Phase, the Data-mining Tool (with possible intervention by the end user) selects the appropriate Modelling or Knowledge Acquisition Algorithms such as Neural Networks, Decision Trees, Rules Induction etc.

PH.IV – PROGNOSIS PHASE

Although many data-mining tools stop at the knowledge-acquisition phase, others continue to the Prognosis Phase.

In that phase, the data-mining findings are used to predict future behaviour and forecast business outcomes.

Examples of data-mining findings can be:

Sixty-five percent of customers who did not use a particular credit card in the last six months are 88 percent likely to cancel that account.
Eighty-two percent of customers who bought a 42-inch or larger LCD TV are 90 percent likely to buy an entertainment center within the next four weeks.

KNOWLEDGE DISCOVERED DURING DATA MINING

These can be described in FIVE Ways:

(I) Association Rules-These rules correlate the presence of a set of items with another range of values for another set of variables.

Example: When a female retail shopper buys a handbag, she is likely to buy shoes.

(II) Classification Hierarchies-The goal is to work from an existing set of events or transactions to create a hierarchy of classes.

Example: A population may be divided into five ranges of credit worthiness based on a history of previous credit transactions.

(III) Sequential Patterns-A sequence of actions or events is sought.

Example: If a patient underwent cardiac bypass surgery for blocked arteries and an aneurysm and later developed high blood urea within a year of surgery, he or she is likely to suffer from kidney failure within the next 18 months.

(IV) Patterns within Time Series-Similarities can be detected within positions of a time series of data, which is a sequence of data taken at regular intervals such as daily sales or daily closing stock prices.

Example: Two products show the same selling pattern in summer but a different one in winter.

(V) Clustering-A given population of events or items can be partitioned (segmented) into sets of "similar" element.

Example: An entire population of treatment data on a disease may be divided into groups based on the similarity of side effects produced.

Closing Comments:

Some of the data-mining findings might fall outside the boundaries of what business managers expect. For Example, a data-mining tool might find a close relationship between a customer’s favourite brand of soda and the brand of tires on the customer’s car. Clearly, that relationship might not be held in high regard among sales managers. (In Regression Analysis, those relationships are commonly described by the label “Idiot Correlation.”)

Fortunately, data mining usually yields more meaningful result. In fact, data mining has proved to be very helpful in finding practical relationships among data that help define customer buying patterns, improve product development and acceptance, reduce healthcare fraud, analyze stock markets, and so on.

Written by- Sameer

***********************

Comments

Sumeet ChopraOctober 27, 2025 at 11:30 AM
Wonderful write-up! Your explanation made it easy to grasp the concept. Looking forward to more posts from you.

Goa Packages with Flight from Delhi
Goa Tour Packages for 4 Persons
Goa Tour Package from Delhi
ReplyDelete
Replies
HousiifyOctober 27, 2025 at 3:25 PM
Nicely written and well structured. I enjoyed every part of the article. Keep posting more informative content like this

Party Venues in Gurgaon
Birthday Party Venues in Gurgaon
Farmhouse in Gurgaon for Party
Private party places in Gurgaon
Corporate party venues in Gurgaon
Rooftop Party Places In Gurgaon
Party Lawn in Gurgaon
cocktail party venues in Gurgaon
Pool Party Venues in Gurgaon
ReplyDelete
Replies
HousiifyJanuary 8, 2026 at 4:36 PM
Great read! The content is well-written and easy to follow. I really appreciate the effort put into explaining the points so clearly and simply.

Farmhouse in Gurgaon for Party
Farmhouse in Gurgaon for Night Stay
Farmhouse in Gurgaon for Picnic
Party Venues in Gurgaon
ReplyDelete
Replies

Add comment

Search This Blog

Sameer Speaks