© 2012 raphael saulus. data mining for fun & profit russ blake principle architect, runge...

115
Data Mining © 2012 Raphael Saulus …not as hard as it sounds!

Upload: ambrose-wheeler

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Data Mining

© 2012 Raphael Saulus

…not as hard as it sounds!

Page 2: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Data Mining for Fun & ProfitRuss BlakePrinciple Architect, Runge Limited

Kevin ClarkeManager Software DevelopmentEB Games Australia, NZ

Darien NagleSolution SpecialistApplication PlatformMicrosoft Corporation

DBI226

Page 3: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Who Is Russ Blake?

Page 4: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Who Is Russ Blake?

or…

Page 5: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Who Is Russ Blake?

or… …would you

buy a PerformanceMonitor from

this dude?

Page 6: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Who is Russ Blake?

Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”

Page 7: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Who is Russ Blake?

Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”

Holds 3 US patents and one Chinese patent

Page 8: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Who is Russ Blake?

Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”

Holds 3 US patents and one Chinese patent But, what have you done for me lately?

Page 9: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Who is Russ Blake?

Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”

Holds 3 US patents and one Chinese patent But, what have you done for me lately?

Fundamental physics. 2 recent papers:The effect of particle creation on space

Explains why things fall

Page 10: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Who is Russ Blake?

Manager of Performance: Windows NTInventor & Author of PerfmonWrote NT Resource Kit book “Optimizing Windows NT”

Holds 3 US patents and one Chinese patent But, what have you done for me lately?

Fundamental physics. 2 recent papers:The effect of particle creation on space

Explains why things fall

The architecture of nuclear binding energy7 times more accurate than the next best model of the nucleus

Page 11: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Runge Ltd

Planning, Scheduling and Forecasting

Mine Planning Consultancy

Page 12: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Why is Russ at Runge?World’s leading mine planning software

Firm roots in applied mathematical modelling

Really smart people

Firm commitment to innovation

Focused on planning and forecasting

Page 13: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

“Predicting is Hard…

Forecasting:

Page 14: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

“Predicting is Hard…

…especially about the future!”

--Yogi Berra

Forecasting:

Page 15: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Data Mining to the rescue!

© 2011 Microsoft Corporation

The future will be like the past…

Page 16: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Data Mining to the rescue!

© 2011 Microsoft Corporation

The future will be like the past…

because…

Page 17: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Data Mining to the rescue!

© 2011 Microsoft Corporation

The future will be like the past…

because…

in the past…

Page 18: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Data Mining to the rescue!

© 2011 Microsoft Corporation

The future will be like the past…

because…

in the past…

the future was like the past!-- Gerald M. Weinberg, An Introduction to General Systems Thinking

Page 19: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Why are we here?

“Data Mining is the top technology to have a

major impact across a wide range of industries in Australia

within the next 5 years…

©2012 Gartner Group Advanced Technology Research Note

Page 20: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Why are we here?

“Data Mining is the top technology to have a

major impact across a wide range of industries in Australia

within the next 5 years…

…and has the greatest skills gap!”

©2012 Gartner Group Advanced Technology Research Note

Page 21: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

What it is…

Data Mining finds patterns in data

Page 22: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

What it is…

Data Mining finds patterns in data

Uses these patterns to make predictions

© 2012 http://www.holdemreview.com

Page 23: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

What it is…

Data Mining finds patterns in data

Uses these patterns to make predictions

Using Machine Learning Algorithms© 2012 http://www.holdemreview.com

Page 24: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

What it is…

Data Mining finds patterns in data

Uses these patterns to make predictions

Using Machine Learning Algorithms

Don’t worry: the hard yards are done© 2012 http://www.holdemreview.com

Page 25: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

What it is…

Data Mining finds patterns in data

Uses these patterns to make predictions

Using Machine Learning Algorithms

Don’t worry: the hard yards are done

A lot at Microsoft Research

© 2012 http://www.holdemreview.com

Page 26: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

How We Do It @

Market Basket Analysis

Page 27: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Page 28: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer Characteristics

Page 29: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer CharacteristicsPurchasing patterns – Enables direct marketing

Page 30: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer CharacteristicsPurchasing patterns – Enables direct marketing

Forecasting

Page 31: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer CharacteristicsPurchasing patterns – Enables direct marketing

ForecastingYear on year sales analysis – Allows for more realistic comps sales

Page 32: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer CharacteristicsPurchasing patterns – Enables direct marketing

ForecastingYear on year sales analysis – Allows for more realistic comps sales

Clickstream Analysis

Page 33: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

How We Do It @

Market Basket AnalysisCustomer Loyalty Program – Links purchases to the individual

Customer CharacteristicsPurchasing patterns – Enables direct marketing

ForecastingYear on year sales analysis – Allows for more realistic comps sales

Clickstream AnalysisAlmost real time e-commerce sales reporting

Page 34: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Data Mining is Self-sufficient

Data Mining does not need a Cube!

Page 35: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

What it’s not

SSAS≠

Cube

Page 36: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

What it’s not

SSAS≠

Cube

© blog.viXra.org

Page 37: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

What it’s not

SSAS≠

Cube© recultured.com

Page 38: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

© 2012 onlyHDwallPapers.com

Cube:

Page 39: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

Prettyhigh barrierto entry

© 2012 onlyHDwallPapers.com

Cube:

Page 40: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube

Prettyhigh barrierto entry

© 2012 onlyHDwallPapers.com

Cube:

Page 41: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX

Prettyhigh barrierto entry

© 2012 onlyHDwallPapers.com

Cube:

Page 42: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Prettyhigh barrierto entry

© 2012 onlyHDwallPapers.com

Cube:

Page 43: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Prettyhigh barrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Page 44: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Prettyhigh barrierto entry

Prettylowbarrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Page 45: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Data Mining:Build Structure

Prettyhigh barrierto entry

Prettylowbarrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Page 46: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Data Mining:Build Structure Configure Model

Prettyhigh barrierto entry

Prettylowbarrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Page 47: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

Data Mining:Build Structure Configure Model Make Predictions

Prettyhigh barrierto entry

Prettylowbarrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Page 48: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Look, Ma, No Cube!

Dimensional Modelling:Build a Cube Learn MDX Construct Analyses

…of the PAST

Data Mining:Build Structure Configure Model Make Predictions

…about the Future

Prettyhigh barrierto entry

Prettylowbarrierto entry

© 2012 onlyHDwallPapers.com© 2012 Microsoft Corporation

Cube: Data Mining:

Page 49: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Why no Cube?

Data mining finds patterns in data

Page 50: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Why no Cube?

Data mining finds patterns in dataCubes abstract much of the interesting information

Page 51: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Why no Cube?

Data mining finds patterns in dataCubes abstract much of the interesting information Data Mine directly on your Data Warehouse

Page 52: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Why no Cube?

Data mining find patterns in dataCubes abstract much of the interesting information Data Mine directly on your Data Warehouse

(or [“shudder”] on your operational database)

Page 53: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Why no Cube?

Data mining find patterns in dataCubes abstract much of the interesting information Data Mine directly on your Data Warehouse

(or [“shudder”] on your operational database)…but now we do have read-only mirrors!

Page 54: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

When to Data Mine a Cube

Complex calculations determine outcome

Page 55: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

When to Data Mine a Cube

Complex calculations determine outcome

Feed results in as new Cube data

Page 56: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

When to Data Mine a Cube

Complex calculations determine outcome

Feed results in as new Cube data

(Caveat: Cannot feed data into original Cube)

Page 57: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,
Page 58: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Can it really be this easy?

Excel Data Mining Add-in

Page 59: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Contrasting Time Series Example

Page 60: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Caveat: Correlation ≠ Causation!

© 2011 xkcd.com

Page 61: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Caveat: Correlation ≠ Causation!

© 2011 xkcd.com

Page 62: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Caveat: Correlation ≠ Causation!

© 2011 xkcd.com

Page 63: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Caveat: Beware the Black Swan!

Page 64: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Caveat: Beware the Black Swan!The Black Swan

The Impact of the Highly Improbable

Nassim Nicholas Taleb

Page 65: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Caveat: Beware the Black Swan!The Black Swan

The Impact of the Highly Improbable

Nassim Nicholas Taleb Central Thesis:All significant eventsare unpredictable!

Page 66: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

SQL 2008 Data Mining Videos …msdn

http://msdn.microsoft.com/en-us/library/dd776389%28v=SQL.100%29.aspx

Page 67: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Tutorial:

Page 68: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Logical Architecture

Page 69: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Demo

Your one-stop-shop for data mining

Data Mining Designer

Page 70: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

But is it Respectable?Is it all just smoke and mirrors?

Page 71: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

But is it Respectable?Is it all just smoke and mirrors?Or…

Page 72: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

But is it Respectable?Is it all just smoke and mirrors?Or…Was Data Mining invented just to make Astrology look respectable?

Page 73: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

How it works inside

Public Domain (Wikipedia Commons)

Page 74: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Decision Tree Algorithm

Correlation Tree Node

Page 75: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Decision Tree Algorithm

Correlation Tree Node

Page 76: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Cluster & Association Algorithms

Page 77: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Cluster & Association Algorithms

Page 78: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Naïve Bayes Algorithm

• Simple, fast, surprisingly accurate

Page 79: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Naïve Bayes Algorithm

• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each

other

Page 80: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Naïve Bayes Algorithm

• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each

other• Pervasive use throughout Data Mining

Page 81: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Naïve Bayes Algorithm

• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each

other• Pervasive use throughout Data Mining

Page 82: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Naïve Bayes Algorithm

• Simple, fast, surprisingly accurate• “Naïve”: attributes assumed to be independent of each

other• Pervasive use throughout Data Mining• Uses Bayes Law:

P(Result | Data) =

P(Data | Result) * P(Result) / P(Data)

Page 83: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Naïve Bayes Algorithm

Page 84: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

P(Girl | Trousers) = ?

Naïve Bayes Algorithm

Page 85: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

P(Girl | Trousers) = ?

P(Trousers | Girl) = 20 / 40

Naïve Bayes Algorithm

Page 86: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

P(Girl | Trousers) = ?

P(Trousers | Girl) = 20 / 40

P(Girl) = 40 / 100

Naïve Bayes Algorithm

Page 87: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

P(Girl | Trousers) = ?

P(Trousers | Girl) = 20 / 40

P(Girl) = 40 / 100

P(Trousers) = 80 / 100

Naïve Bayes Algorithm

Page 88: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

P(Girl | Trousers) = ?

P(Trousers | Girl) = 20 / 40

P(Girl) = 40 / 100

P(Trousers) = 80 / 100

P(Girl | Trousers) = P(Trousers | Girl) P(Girl) / P(Trousers)

Naïve Bayes Algorithm

Page 89: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

P(Girl | Trousers) = ?

P(Trousers | Girl) = 20 / 40

P(Girl) = 40 / 100

P(Trousers) = 80 / 100

P(Girl | Trousers) = P(Trousers | Girl) P(Girl) / P(Trousers)

= (20 / 40) (40 / 100) / (80 / 100) = 20 / 80 = 0.25

Naïve Bayes Algorithm

Page 90: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Neural Network Algorithm

LocWeight2

Sex

Weight3

Age

Weight

Buy

No

Input NeuronsHidden Neurons

Output Neurons

W

W

W

W

W

W

W

W

WW

W

W

W

W

W

Page 91: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Neural Network Algorithm

LocWeight2

Sex

Weight3

Age

Weight

Buy

No

Input NeuronsHidden Neurons

Output Neurons

W

W

W

W

W

W

W

W

WW

W

W

W

W

W

• Multilayer Perceptron Network

Page 92: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Neural Network Algorithm

LocWeight2

Sex

Weight3

Age

Weight

Buy

No

Input NeuronsHidden Neurons

Output Neurons

W

W

W

W

W

W

W

W

WW

W

W

W

W

W

• Multilayer Perceptron Network akaBack-Propagated Delta Rule Network

Page 93: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Comparing Algorithms

Bike Buyers

Population Random: 50%

TargetedData Mining: 85%

Ideal: 100%

Lift Chart Operation

Page 94: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Time Series

Combines 2 algorithms

Page 95: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Time Series

Combines 2 algorithmsARTxp

Short-term prediction

Page 96: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Time Series

Combines 2 algorithmsARTxp

Short-term prediction

ARIMALong-term prediction

Page 97: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Auto-Regressive Time x Periods

Page 98: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Auto-Regressive Including Moving Averages

Handles dependencies

Page 99: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

ARIMA

Handles shocks

Page 100: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

ARIMA

Detect periodicity:

Page 101: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Fourier Transform

Detect periodicity:

⁼+

Page 102: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

ARTxp and ARIMA Blended

Default PREDICTION_SMOOTHING = 0.5

Page 103: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

ARTxp and ARIMA Blended

PREDICTION_SMOOTHING = 0.2

Page 104: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Take-Aways

Off-the-shelf toolkit

Page 105: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Take-Aways

Off-the-shelf toolkitNo Cube required

Page 106: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Take-Aways

Off-the-shelf toolkitNo Cube requiredNo code required

Page 107: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Take-Aways

Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters

Page 108: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Take-Aways

Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters Easily explored models

Page 109: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Take-Aways

Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters Easily explored models

Change parameters, filter input, compare lift

Page 110: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Take-Aways

Off-the-shelf toolkitNo Cube requiredNo code requiredGood default parameters Easily explored models

Change parameters, filter input, compare liftExcel Add-In

Page 111: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Caveats:

Correlation ≠Causation

Page 112: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Caveats:

Correlation ≠Causation

Beware the Black Swan

Page 113: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

References

Data Mining Add-inshttp://office.microsoft.com/en-us/excel-help/data-mining-add-ins-HA010342915.aspx#_Toc257717762

Analysis Services - Data Mining Videoshttp://msdn.microsoft.com/en-us/library/dd776389(v=SQL.100).aspx

SQL Server Data Mining Homehttp://www.sqlserverdatamining.com/ssdm/

 Microsoft Contoso BI Demo Dataset for Retail Industryhttp://www.microsoft.com/downloads/en/details.aspx?displaylang=en&FamilyID=868662dc-187a-4a85-b611-b7df7dc909fc

 What Every IT Manager Should Know About Business Users’ Real Needs for BIhttp://docs.media.bitpipe.com/io_25x/io_25515/item_392177/Tableau_S_MktgLtr_BI_IT.pdf

 An Introduction to Data Mining : Discovering hidden value in your data warehousehttp://www.thearling.com/text/dmwhite/dmwhite.htm

Page 114: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

Related Content

Database and Business Intelligence Track: All Sessions

Exam 467 (new) or 460 (upgrade) to MCSE Business Intelligence

Find Me Later At the Friday 11AM Meet and Greet

Page 115: © 2012 Raphael Saulus. Data Mining for Fun & Profit Russ Blake Principle Architect, Runge Limited Kevin Clarke Manager Software Development EB Games Australia,

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS

PRESENTATION.