chapter 12 (part 2) data mining
DESCRIPTION
Chapter 12 (part 2) Data Mining. Instructor: Paul Chen. Descriptive: The dealer sold 200 cars last month . Operational. (OLTP). Explanatory: For every increase in 1 % in the interest, auto sales decrease by 5 %. Traditional DW. OLAP. - PowerPoint PPT PresentationTRANSCRIPT
Chapter 12 (part 2) Data Mining
Instructor: Paul Chen
Descriptive: The dealer sold 200 cars last month.
Explanatory: For every increase in 1 % in the interest,auto sales decrease by 5 %.
Predictive: predictions about future buyer behavior.
Traditional DW
Operational
OLAP
(OLTP)
Data Mining
Data Mining and OLAP
They are two separate breeds of analysis with entirely different objectives, not to mention
tools, skill sets, and implementation methods.
Data Mining
With canned reports, ad hoc querying, and OLAP, the end user defines a hypothesis and determines which data to examine. With data mining, the tool identifies the hypothesis, and it actually tells the user where in the data to start the exploration process.
Data Mining
Rather than using SQL to filter out values and methodically reduce the data into a concise answer set, data mining uses algorithms that exhaustively review the relationships among data elements to determine if any patterns exist. The whole purpose of data mining is to yield new business information that a business person can act on.
Data Mining Tools
Data mining tools are typically classified by the type of algorithm they use to identify hidden patterns. There are many different algorithms in use, but the four mostpopular are association, sequence, clustering (or segmentation), and predictive modeling.
Data Mining Tools
ASSOCIATION
Association, also frequently referred to as "affinity analysis," reviews numerous sets of items and looks for common groupings. An example of association is market basket analysis, which involves reviewing the products that consumers purchase in a single trip to the grocery store.
ASSOCIATION
Finds items that imply the presence of other items in the same event.
Affinities between items are represented by association rules.
– e.g. ‘When a customer rents property for more than 2 years and is more than 25 years old, in 40% of cases, the customer will buy a property. This association happens in 35% of all customers who rent properties’.
Data Mining Tools
SEQUENCE
Sequential analysis helps data miners identify a set of order-specific items or events. Association identifies the existence of patterns or groups of items; sequential
analysis identifies the order of those patterns or groups of items.
SEQUENCE
Finds patterns between events such that the presence of one set of items is followed by another set of items in a database of events over a period of time.
e.g. Used to understand long term customer buying behavior.
Link Analysis - Similar Time Sequence Discovery
Finds links between two sets of data that are time-dependent, and is based on the degree of similarity between the patterns that both time series demonstrate.
e.g. Within three months of buying property, new home
owners will purchase goods such as cookers, freezers, and washing machines.
Data Mining Tools
CLUSTERING
Cluster analysis lets the data miner assemble data into unforeseen groups containing similar characteristics. Also known as "segmentation," this type of data
mining is probably the most widely used.
CLUSTERING
Aim is to partition a database into an unknown number of segments, or clusters, of similar records.
Uses unsupervised learning to discover homogeneous sub-populations in a database to improve the accuracy of the profiles.
Data Mining Tools
PREDICTIVE MODELING
As the name implies, predictive modeling involves developing a model from historical data for predicting a future event. The power of predictive modeling engines is that they can use a broad range of data attributes to identify future behavior. Both cluster analysis and predictive modeling tools identify distinct groups of items with common attributes; the difference is that predictive modeling focuses on the likelihood of a particular outcome for a particular group.
PREDICTIVE MODELING Similar to the human learning experience
– uses observations to form a model of the important characteristics of some phenomenon.
Uses generalizations of ‘real world’ and ability to fit new data into a general framework.
Can analyze a database to determine essential characteristics (model) about the data set.
PREDICTIVE MODELING
There are two techniques associated with predictive modeling: classification and value prediction, which are distinguished by the nature of the variable being predicted.
PREDICTIVE MODELING-classification
Used to establish a specific predetermined class for each record in a database from a finite set of possible, class values.
Two specializations of classification: tree induction and neural induction.
car = taurus
city=seattle
age<45
likely unlikely likely unlikely
y n
y n y n
62
Example of Classification using Neural Induction
PREDICTIVE MODELING- Value Prediction
Used to estimate a continuous numeric value that is associated with a database record.
Uses the traditional statistical techniques of linear regression and nonlinear regression.
Relatively easy-to-use and understand.
PREDICTIVE MODELING- Value Prediction
Linear regression attempts to fit a straight line through a plot of the data, such that the line is the best representation of the average of all observations at that point in the plot.
Problem is that the technique only works well with linear data and is sensitive to the presence of outliers (that is, data values, which do not conform to the expected norm).
PREDICTIVE MODELING- Value Prediction
Although nonlinear regression avoids the main problems of linear regression, it is still not flexible enough to handle all possible shapes of the data plot.
Statistical measurements are fine for building linear models that describe predictable data points, however, most data is not linear in nature.
PREDICTIVE MODELING- Value Prediction
Data mining requires statistical methods that can accommodate non-linearity, outliers, and non-numeric data.
Applications of value prediction include credit card fraud detection or target mailing list identification.
ARE YOU READY FOR DATA MINING?
Just because you have a data warehouse doesn’t mean you’re necessarily ready for data mining. Much of the work our company does in the data mining arena hasmore to do with data mining readiness assessment than with actually performing data mining.
Metrics you can use to gauge your data mining readiness
Do you have a staff of experienced knowledge workers? Do you have the data? Do you have marketing processes in place that can use this
data? Do you have a business champion who can embrace the
process and results? Do you have the technology infrastructure to support
advanced analysis?
OLAP vs. Mining ToolsOLAP vs. Mining Tools
Are ad hoc, shrink wrapped tools that provide
an interface to data Are used when you have
specific questions Looks and feels like a
spreadsheet that allow rotation, slicing and graphics
Can be deployed to large number of users
Methods for analyzing multiple data types
-- Regression trees -- Neural networks -- Genetic algorithms Usually textual in nature Usually deployed to a
small number of analysis