© 2012 raphael saulus. data mining for fun & profit russ blake principle architect, runge...
Post on 12-Jan-2016
213 Views
Preview:
TRANSCRIPT
Data Mining
© 2012 Raphael Saulus
…not as hard as it sounds!
Data Mining for Fun & ProfitRuss BlakePrinciple Architect, Runge Limited
Kevin ClarkeManager Software DevelopmentEB Games Australia, NZ
Darien NagleSolution SpecialistApplication PlatformMicrosoft Corporation
DBI226
Who Is Russ Blake?
Who Is Russ Blake?
or…
Who Is Russ Blake?
or… …would you
buy a PerformanceMonitor from
this dude?
Who is Russ Blake?
Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”
Who is Russ Blake?
Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”
Holds 3 US patents and one Chinese patent
Who is Russ Blake?
Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”
Holds 3 US patents and one Chinese patent But, what have you done for me lately?
Who is Russ Blake?
Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”
Holds 3 US patents and one Chinese patent But, what have you done for me lately?
Fundamental physics. 2 recent papers:The effect of particle creation on space
Explains why things fall
Who is Russ Blake?
Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”
Holds 3 US patents and one Chinese patent But, what have you done for me lately?
Fundamental physics. 2 recent papers:The effect of particle creation on space
Explains why things fall
The architecture of nuclear binding energy7 times more accurate than the next best model of the nucleus
Runge Ltd
Planning, Scheduling and Forecasting
Mine Planning Consultancy
Why is Russ at Runge?World’s leading mine planning software
Firm roots in applied mathematical modelling
Really smart people
Firm commitment to innovation
Focused on planning and forecasting
“Predicting is Hard…
Forecasting:
“Predicting is Hard…
…especially about the future!”
--Yogi Berra
Forecasting:
Data Mining to the rescue!
© 2011 Microsoft Corporation
The future will be like the past…
Data Mining to the rescue!
© 2011 Microsoft Corporation
The future will be like the past…
because…
Data Mining to the rescue!
© 2011 Microsoft Corporation
The future will be like the past…
because…
in the past…
Data Mining to the rescue!
© 2011 Microsoft Corporation
The future will be like the past…
because…
in the past…
the future was like the past!-- Gerald M. Weinberg, An Introduction to General Systems Thinking
Why are we here?
“Data Mining is the top technology to have a
major impact across a wide range of industries in Australia
within the next 5 years…
©2012 Gartner Group Advanced Technology Research Note
Why are we here?
“Data Mining is the top technology to have a
major impact across a wide range of industries in Australia
within the next 5 years…
…and has the greatest skills gap!”
©2012 Gartner Group Advanced Technology Research Note
What it is…
Data Mining finds patterns in data
What it is…
Data Mining finds patterns in data
Uses these patterns to make predictions
© 2012 http://www.holdemreview.com
What it is…
Data Mining finds patterns in data
Uses these patterns to make predictions
Using Machine Learning Algorithms© 2012 http://www.holdemreview.com
What it is…
Data Mining finds patterns in data
Uses these patterns to make predictions
Using Machine Learning Algorithms
Don’t worry: the hard yards are done© 2012 http://www.holdemreview.com
What it is…
Data Mining finds patterns in data
Uses these patterns to make predictions
Using Machine Learning Algorithms
Don’t worry: the hard yards are done
A lot at Microsoft Research
© 2012 http://www.holdemreview.com
How We Do It @
Market Basket Analysis
How We Do It @
Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual
How We Do It @
Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual
Customer Characteristics
How We Do It @
Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual
Customer CharacteristicsPurchasing patterns – Enables direct marketing
How We Do It @
Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual
Customer CharacteristicsPurchasing patterns – Enables direct marketing
Forecasting
How We Do It @
Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual
Customer CharacteristicsPurchasing patterns – Enables direct marketing
ForecastingYear on year sales analysis – Allows for more realistic comps sales
How We Do It @
Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual
Customer CharacteristicsPurchasing patterns – Enables direct marketing
ForecastingYear on year sales analysis – Allows for more realistic comps sales
Clickstream Analysis
How We Do It @
Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual
Customer CharacteristicsPurchasing patterns – Enables direct marketing
ForecastingYear on year sales analysis – Allows for more realistic comps sales
Clickstream AnalysisAlmost real time e-commerce sales reporting
Data Mining is Self-sufficient
Data Mining does not need a Cube!
What it’s not
SSAS≠
Cube
What it’s not
SSAS≠
Cube
© blog.viXra.org
What it’s not
SSAS≠
Cube© recultured.com
Look, Ma, No Cube!
© 2012 onlyHDwallPapers.com
Cube:
Look, Ma, No Cube!
Prettyhigh barrierto entry
© 2012 onlyHDwallPapers.com
Cube:
Look, Ma, No Cube!
Dimensional Modelling:Build a Cube
Prettyhigh barrierto entry
© 2012 onlyHDwallPapers.com
Cube:
Look, Ma, No Cube!
Dimensional Modelling:Build a Cube Learn MDX
Prettyhigh barrierto entry
© 2012 onlyHDwallPapers.com
Cube:
Look, Ma, No Cube!
Dimensional Modelling:Build a Cube Learn MDX Construct Analyses
Prettyhigh barrierto entry
© 2012 onlyHDwallPapers.com
Cube:
Look, Ma, No Cube!
Dimensional Modelling:Build a Cube Learn MDX Construct Analyses
Prettyhigh barrierto entry
© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation
Cube: Data Mining:
Look, Ma, No Cube!
Dimensional Modelling:Build a Cube Learn MDX Construct Analyses
Prettyhigh barrierto entry
Prettylowbarrierto entry
© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation
Cube: Data Mining:
Look, Ma, No Cube!
Dimensional Modelling:Build a Cube Learn MDX Construct Analyses
Data Mining:Build Structure
Prettyhigh barrierto entry
Prettylowbarrierto entry
© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation
Cube: Data Mining:
Look, Ma, No Cube!
Dimensional Modelling:Build a Cube Learn MDX Construct Analyses
Data Mining:Build Structure Configure Model
Prettyhigh barrierto entry
Prettylowbarrierto entry
© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation
Cube: Data Mining:
Look, Ma, No Cube!
Dimensional Modelling:Build a Cube Learn MDX Construct Analyses
Data Mining:Build Structure Configure Model Make Predictions
Prettyhigh barrierto entry
Prettylowbarrierto entry
© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation
Cube: Data Mining:
Look, Ma, No Cube!
Dimensional Modelling:Build a Cube Learn MDX Construct Analyses
…of the PAST
Data Mining:Build Structure Configure Model Make Predictions
…about the Future
Prettyhigh barrierto entry
Prettylowbarrierto entry
© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation
Cube: Data Mining:
Why no Cube?
Data mining finds patterns in data
Why no Cube?
Data mining finds patterns in dataCubes abstract much of the interesting information
Why no Cube?
Data mining finds patterns in dataCubes abstract much of the interesting information Data Mine directly on your Data Warehouse
Why no Cube?
Data mining find patterns in dataCubes abstract much of the interesting information Data Mine directly on your Data Warehouse
(or [“shudder”] on your operational database)
Why no Cube?
Data mining find patterns in dataCubes abstract much of the interesting information Data Mine directly on your Data Warehouse
(or [“shudder”] on your operational database)…but now we do have read-only mirrors!
When to Data Mine a Cube
Complex calculations determine outcome
When to Data Mine a Cube
Complex calculations determine outcome
Feed results in as new Cube data
When to Data Mine a Cube
Complex calculations determine outcome
Feed results in as new Cube data
(Caveat: Cannot feed data into original Cube)
Can it really be this easy?
Excel Data Mining Add-in
Contrasting Time Series Example
Caveat: Correlation ≠ Causation!
© 2011 xkcd.com
Caveat: Correlation ≠ Causation!
© 2011 xkcd.com
Caveat: Correlation ≠ Causation!
© 2011 xkcd.com
Caveat: Beware the Black Swan!
Caveat: Beware the Black Swan!The Black Swan
The Impact of the Highly Improbable
Nassim Nicholas Taleb
Caveat: Beware the Black Swan!The Black Swan
The Impact of the Highly Improbable
Nassim Nicholas Taleb Central Thesis:All significant eventsare unpredictable!
SQL 2008 Data Mining Videos …msdn
http://msdn.microsoft.com/en-us/library/dd776389%28v=SQL.100%29.aspx
Tutorial:
Logical Architecture
Demo
Your one-stop-shop for data mining
Data Mining Designer
But is it Respectable?Is it all just smoke and mirrors?
But is it Respectable?Is it all just smoke and mirrors?Or…
But is it Respectable?Is it all just smoke and mirrors?Or…Was Data Mining invented just to make Astrology look respectable?
How it works inside
Public Domain (Wikipedia Commons)
Decision Tree Algorithm
Correlation Tree Node
Decision Tree Algorithm
Correlation Tree Node
Cluster & Association Algorithms
Cluster & Association Algorithms
Naïve Bayes Algorithm
• Simple, fast, surprisingly accurate
Naïve Bayes Algorithm
• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each
other
Naïve Bayes Algorithm
• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each
other• Pervasive use throughout Data Mining
Naïve Bayes Algorithm
• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each
other• Pervasive use throughout Data Mining
Naïve Bayes Algorithm
• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each
other• Pervasive use throughout Data Mining• Uses Bayes Law:
P(Result | Data) =
P(Data | Result) * P(Result) / P(Data)
Naïve Bayes Algorithm
P(Girl | Trousers) = ?
Naïve Bayes Algorithm
P(Girl | Trousers) = ?
P(Trousers | Girl) = 20 / 40
Naïve Bayes Algorithm
P(Girl | Trousers) = ?
P(Trousers | Girl) = 20 / 40
P(Girl) = 40 / 100
Naïve Bayes Algorithm
P(Girl | Trousers) = ?
P(Trousers | Girl) = 20 / 40
P(Girl) = 40 / 100
P(Trousers) = 80 / 100
Naïve Bayes Algorithm
P(Girl | Trousers) = ?
P(Trousers | Girl) = 20 / 40
P(Girl) = 40 / 100
P(Trousers) = 80 / 100
P(Girl | Trousers) = P(Trousers | Girl) P(Girl) / P(Trousers)
Naïve Bayes Algorithm
P(Girl | Trousers) = ?
P(Trousers | Girl) = 20 / 40
P(Girl) = 40 / 100
P(Trousers) = 80 / 100
P(Girl | Trousers) = P(Trousers | Girl) P(Girl) / P(Trousers)
= (20 / 40) (40 / 100) / (80 / 100) = 20 / 80 = 0.25
Naïve Bayes Algorithm
Neural Network Algorithm
LocWeight2
Sex
Weight3
Age
Weight
Buy
No
Input NeuronsHidden Neurons
Output Neurons
W
W
W
W
W
W
W
W
WW
W
W
W
W
W
Neural Network Algorithm
LocWeight2
Sex
Weight3
Age
Weight
Buy
No
Input NeuronsHidden Neurons
Output Neurons
W
W
W
W
W
W
W
W
WW
W
W
W
W
W
• Multilayer Perceptron Network
Neural Network Algorithm
LocWeight2
Sex
Weight3
Age
Weight
Buy
No
Input NeuronsHidden Neurons
Output Neurons
W
W
W
W
W
W
W
W
WW
W
W
W
W
W
• Multilayer Perceptron Network akaBack-Propagated Delta Rule Network
Comparing Algorithms
Bike Buyers
Population Random: 50%
TargetedData Mining: 85%
Ideal: 100%
Lift Chart Operation
Time Series
Combines 2 algorithms
Time Series
Combines 2 algorithmsARTxp
Short-term prediction
Time Series
Combines 2 algorithmsARTxp
Short-term prediction
ARIMALong-term prediction
Auto-Regressive Time x Periods
Auto-Regressive Including Moving Averages
Handles dependencies
ARIMA
Handles shocks
ARIMA
Detect periodicity:
Fourier Transform
Detect periodicity:
⁼+
ARTxp and ARIMA Blended
Default PREDICTION_SMOOTHING = 0.5
ARTxp and ARIMA Blended
PREDICTION_SMOOTHING = 0.2
Take-Aways
Off-the-shelf toolkit
Take-Aways
Off-the-shelf toolkitNo Cube required
Take-Aways
Off-the-shelf toolkitNo Cube requiredNo code required
Take-Aways
Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters
Take-Aways
Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters Easily explored models
Take-Aways
Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters Easily explored models
Change parameters, filter input, compare lift
Take-Aways
Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters Easily explored models
Change parameters, filter input, compare liftExcel Add-In
Caveats:
Correlation ≠Causation
Caveats:
Correlation ≠Causation
Beware the Black Swan
References
Data Mining Add-inshttp://office.microsoft.com/en-us/excel-help/data-mining-add-ins-HA010342915.aspx#_Toc257717762
Analysis Services - Data Mining Videoshttp://msdn.microsoft.com/en-us/library/dd776389(v=SQL.100).aspx
SQL Server Data Mining Homehttp://www.sqlserverdatamining.com/ssdm/
Microsoft Contoso BI Demo Dataset for Retail Industryhttp://www.microsoft.com/downloads/en/details.aspx?displaylang=en&FamilyID=868662dc-187a-4a85-b611-b7df7dc909fc
What Every IT Manager Should Know About Business Users’ Real Needs for BIhttp://docs.media.bitpipe.com/io_25x/io_25515/item_392177/Tableau_S_MktgLtr_BI_IT.pdf
An Introduction to Data Mining : Discovering hidden value in your data warehousehttp://www.thearling.com/text/dmwhite/dmwhite.htm
Related Content
Database and Business Intelligence Track: All Sessions
Exam 467 (new) or 460 (upgrade) to MCSE Business Intelligence
Find Me Later At the Friday 11AM Meet and Greet
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.
top related