© 2012 raphael saulus. data mining for fun & profit russ blake principle architect, runge...

Post on 12-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Mining

© 2012 Raphael Saulus

…not as hard as it sounds!

Data Mining for Fun & ProfitRuss BlakePrinciple Architect, Runge Limited

Kevin ClarkeManager Software DevelopmentEB Games Australia, NZ

Darien NagleSolution SpecialistApplication PlatformMicrosoft Corporation

DBI226

Who Is Russ Blake?

Who Is Russ Blake?

or…

Who Is Russ Blake?

or… …would you

buy a PerformanceMonitor from

this dude?

Who is Russ Blake?

Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”

Who is Russ Blake?

Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”

Holds 3 US patents and one Chinese patent

Who is Russ Blake?

Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”

Holds 3 US patents and one Chinese patent But, what have you done for me lately?

Who is Russ Blake?

Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”

Holds 3 US patents and one Chinese patent But, what have you done for me lately?

Fundamental physics. 2 recent papers:The effect of particle creation on space

Explains why things fall

Who is Russ Blake?

Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”

Holds 3 US patents and one Chinese patent But, what have you done for me lately?

Fundamental physics. 2 recent papers:The effect of particle creation on space

Explains why things fall

The architecture of nuclear binding energy7 times more accurate than the next best model of the nucleus

Runge Ltd

Planning, Scheduling and Forecasting

Mine Planning Consultancy

Why is Russ at Runge?World’s leading mine planning software

Firm roots in applied mathematical modelling

Really smart people

Firm commitment to innovation

Focused on planning and forecasting

“Predicting is Hard…

Forecasting:

“Predicting is Hard…

…especially about the future!”

--Yogi Berra

Forecasting:

Data Mining to the rescue!

© 2011 Microsoft Corporation

The future will be like the past…

Data Mining to the rescue!

© 2011 Microsoft Corporation

The future will be like the past…

because…

Data Mining to the rescue!

© 2011 Microsoft Corporation

The future will be like the past…

because…

in the past…

Data Mining to the rescue!

© 2011 Microsoft Corporation

The future will be like the past…

because…

in the past…

the future was like the past!-- Gerald M. Weinberg, An Introduction to General Systems Thinking

Why are we here?

“Data Mining is the top technology to have a

major impact across a wide range of industries in Australia

within the next 5 years…

©2012 Gartner Group Advanced Technology Research Note

Why are we here?

“Data Mining is the top technology to have a

major impact across a wide range of industries in Australia

within the next 5 years…

…and has the greatest skills gap!”

©2012 Gartner Group Advanced Technology Research Note

What it is…

Data Mining finds patterns in data

What it is…

Data Mining finds patterns in data

Uses these patterns to make predictions

© 2012 http://www.holdemreview.com

What it is…

Data Mining finds patterns in data

Uses these patterns to make predictions

Using Machine Learning Algorithms© 2012 http://www.holdemreview.com

What it is…

Data Mining finds patterns in data

Uses these patterns to make predictions

Using Machine Learning Algorithms

Don’t worry: the hard yards are done© 2012 http://www.holdemreview.com

What it is…

Data Mining finds patterns in data

Uses these patterns to make predictions

Using Machine Learning Algorithms

Don’t worry: the hard yards are done

A lot at Microsoft Research

© 2012 http://www.holdemreview.com

How We Do It @

Market Basket Analysis

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer Characteristics

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer CharacteristicsPurchasing patterns – Enables direct marketing

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer CharacteristicsPurchasing patterns – Enables direct marketing

Forecasting

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer CharacteristicsPurchasing patterns – Enables direct marketing

ForecastingYear on year sales analysis – Allows for more realistic comps sales

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer CharacteristicsPurchasing patterns – Enables direct marketing

ForecastingYear on year sales analysis – Allows for more realistic comps sales

Clickstream Analysis

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer CharacteristicsPurchasing patterns – Enables direct marketing

ForecastingYear on year sales analysis – Allows for more realistic comps sales

Clickstream AnalysisAlmost real time e-commerce sales reporting

Data Mining is Self-sufficient

Data Mining does not need a Cube!

What it’s not

SSAS≠

Cube

What it’s not

SSAS≠

Cube

© blog.viXra.org

What it’s not

SSAS≠

Cube© recultured.com

Look, Ma, No Cube!

© 2012 onlyHDwallPapers.com

Cube:

Look, Ma, No Cube!

Prettyhigh barrierto entry

© 2012 onlyHDwallPapers.com

Cube:

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube

Prettyhigh barrierto entry

© 2012 onlyHDwallPapers.com

Cube:

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX

Prettyhigh barrierto entry

© 2012 onlyHDwallPapers.com

Cube:

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Prettyhigh barrierto entry

© 2012 onlyHDwallPapers.com

Cube:

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Prettyhigh barrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Prettyhigh barrierto entry

Prettylowbarrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Data Mining:Build Structure

Prettyhigh barrierto entry

Prettylowbarrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Data Mining:Build Structure Configure Model

Prettyhigh barrierto entry

Prettylowbarrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Data Mining:Build Structure Configure Model Make Predictions

Prettyhigh barrierto entry

Prettylowbarrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

…of the PAST

Data Mining:Build Structure Configure Model Make Predictions

…about the Future

Prettyhigh barrierto entry

Prettylowbarrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Why no Cube?

Data mining finds patterns in data

Why no Cube?

Data mining finds patterns in dataCubes abstract much of the interesting information

Why no Cube?

Data mining finds patterns in dataCubes abstract much of the interesting information Data Mine directly on your Data Warehouse

Why no Cube?

Data mining find patterns in dataCubes abstract much of the interesting information Data Mine directly on your Data Warehouse

(or [“shudder”] on your operational database)

Why no Cube?

Data mining find patterns in dataCubes abstract much of the interesting information Data Mine directly on your Data Warehouse

(or [“shudder”] on your operational database)…but now we do have read-only mirrors!

When to Data Mine a Cube

Complex calculations determine outcome

When to Data Mine a Cube

Complex calculations determine outcome

Feed results in as new Cube data

When to Data Mine a Cube

Complex calculations determine outcome

Feed results in as new Cube data

(Caveat: Cannot feed data into original Cube)

Can it really be this easy?

Excel Data Mining Add-in

Contrasting Time Series Example

Caveat: Correlation ≠ Causation!

© 2011 xkcd.com

Caveat: Correlation ≠ Causation!

© 2011 xkcd.com

Caveat: Correlation ≠ Causation!

© 2011 xkcd.com

Caveat: Beware the Black Swan!

Caveat: Beware the Black Swan!The Black Swan

The Impact of the Highly Improbable

Nassim Nicholas Taleb

Caveat: Beware the Black Swan!The Black Swan

The Impact of the Highly Improbable

Nassim Nicholas Taleb Central Thesis:All significant eventsare unpredictable!

SQL 2008 Data Mining Videos …msdn

http://msdn.microsoft.com/en-us/library/dd776389%28v=SQL.100%29.aspx

Tutorial:

Logical Architecture

Demo

Your one-stop-shop for data mining

Data Mining Designer

But is it Respectable?Is it all just smoke and mirrors?

But is it Respectable?Is it all just smoke and mirrors?Or…

But is it Respectable?Is it all just smoke and mirrors?Or…Was Data Mining invented just to make Astrology look respectable?

How it works inside

Public Domain (Wikipedia Commons)

Decision Tree Algorithm

Correlation Tree Node

Decision Tree Algorithm

Correlation Tree Node

Cluster & Association Algorithms

Cluster & Association Algorithms

Naïve Bayes Algorithm

• Simple, fast, surprisingly accurate

Naïve Bayes Algorithm

• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each

other

Naïve Bayes Algorithm

• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each

other• Pervasive use throughout Data Mining

Naïve Bayes Algorithm

• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each

other• Pervasive use throughout Data Mining

Naïve Bayes Algorithm

• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each

other• Pervasive use throughout Data Mining• Uses Bayes Law:

P(Result | Data) =

P(Data | Result) * P(Result) / P(Data)

Naïve Bayes Algorithm

P(Girl | Trousers) = ?

Naïve Bayes Algorithm

P(Girl | Trousers) = ?

P(Trousers | Girl) = 20 / 40

Naïve Bayes Algorithm

P(Girl | Trousers) = ?

P(Trousers | Girl) = 20 / 40

P(Girl) = 40 / 100

Naïve Bayes Algorithm

P(Girl | Trousers) = ?

P(Trousers | Girl) = 20 / 40

P(Girl) = 40 / 100

P(Trousers) = 80 / 100

Naïve Bayes Algorithm

P(Girl | Trousers) = ?

P(Trousers | Girl) = 20 / 40

P(Girl) = 40 / 100

P(Trousers) = 80 / 100

P(Girl | Trousers) = P(Trousers | Girl) P(Girl) / P(Trousers)

Naïve Bayes Algorithm

P(Girl | Trousers) = ?

P(Trousers | Girl) = 20 / 40

P(Girl) = 40 / 100

P(Trousers) = 80 / 100

P(Girl | Trousers) = P(Trousers | Girl) P(Girl) / P(Trousers)

= (20 / 40) (40 / 100) / (80 / 100) = 20 / 80 = 0.25

Naïve Bayes Algorithm

Neural Network Algorithm

LocWeight2

Sex

Weight3

Age

Weight

Buy

No

Input NeuronsHidden Neurons

Output Neurons

W

W

W

W

W

W

W

W

WW

W

W

W

W

W

Neural Network Algorithm

LocWeight2

Sex

Weight3

Age

Weight

Buy

No

Input NeuronsHidden Neurons

Output Neurons

W

W

W

W

W

W

W

W

WW

W

W

W

W

W

• Multilayer Perceptron Network

Neural Network Algorithm

LocWeight2

Sex

Weight3

Age

Weight

Buy

No

Input NeuronsHidden Neurons

Output Neurons

W

W

W

W

W

W

W

W

WW

W

W

W

W

W

• Multilayer Perceptron Network akaBack-Propagated Delta Rule Network

Comparing Algorithms

Bike Buyers

Population Random: 50%

TargetedData Mining: 85%

Ideal: 100%

Lift Chart Operation

Time Series

Combines 2 algorithms

Time Series

Combines 2 algorithmsARTxp

Short-term prediction

Time Series

Combines 2 algorithmsARTxp

Short-term prediction

ARIMALong-term prediction

Auto-Regressive Time x Periods

Auto-Regressive Including Moving Averages

Handles dependencies

ARIMA

Handles shocks

ARIMA

Detect periodicity:

Fourier Transform

Detect periodicity:

⁼+

ARTxp and ARIMA Blended

Default PREDICTION_SMOOTHING = 0.5

ARTxp and ARIMA Blended

PREDICTION_SMOOTHING = 0.2

Take-Aways

Off-the-shelf toolkit

Take-Aways

Off-the-shelf toolkitNo Cube required

Take-Aways

Off-the-shelf toolkitNo Cube requiredNo code required

Take-Aways

Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters

Take-Aways

Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters Easily explored models

Take-Aways

Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters Easily explored models

Change parameters, filter input, compare lift

Take-Aways

Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters Easily explored models

Change parameters, filter input, compare liftExcel Add-In

Caveats:

Correlation ≠Causation

Caveats:

Correlation ≠Causation

Beware the Black Swan

References

Data Mining Add-inshttp://office.microsoft.com/en-us/excel-help/data-mining-add-ins-HA010342915.aspx#_Toc257717762

Analysis Services - Data Mining Videoshttp://msdn.microsoft.com/en-us/library/dd776389(v=SQL.100).aspx

SQL Server Data Mining Homehttp://www.sqlserverdatamining.com/ssdm/

 Microsoft Contoso BI Demo Dataset for Retail Industryhttp://www.microsoft.com/downloads/en/details.aspx?displaylang=en&FamilyID=868662dc-187a-4a85-b611-b7df7dc909fc

 What Every IT Manager Should Know About Business Users’ Real Needs for BIhttp://docs.media.bitpipe.com/io_25x/io_25515/item_392177/Tableau_S_MktgLtr_BI_IT.pdf

 An Introduction to Data Mining : Discovering hidden value in your data warehousehttp://www.thearling.com/text/dmwhite/dmwhite.htm

Related Content

Database and Business Intelligence Track: All Sessions

Exam 467 (new) or 460 (upgrade) to MCSE Business Intelligence

Find Me Later At the Friday 11AM Meet and Greet

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS

PRESENTATION.

top related