dmdw lesson 02 - basics with adventure works

22
STAATLICH ANERKANNTE FACHHOCHSCHULE Author I: M.Sc. Johannes Hofmeister Author II: Dip.-Inf. (FH) Johannes Hoppe Date: 25.02.2011 STUDIEREN UND DURCHSTARTEN.

Upload: johannes-hoppe

Post on 27-Jan-2015

151 views

Category:

Technology


6 download

DESCRIPTION

 

TRANSCRIPT

STAATLICHANERKANNTEFACHHOCHSCHULE

Author I: M.Sc. Johannes HofmeisterAuthor II: Dip.-Inf. (FH) Johannes HoppeDate: 25.02.2011

STUDIERENUND DURCHSTARTEN.

STAATLICHANERKANNTEFACHHOCHSCHULE

Basics – Adventure Works

Author I: M.Sc. Johannes HofmeisterAuthor II: Dip.-Inf. (FH) Johannes HoppeDate: 25.02.2011

Adventure Works

01

Slide 3

Resources

› Microsoft Visual Studio 2008 (NOT 2010)

› SQL Server 2008 (NOT Express Edition)

› MSSQL Server Community Projects & Sampleshttp://www.codeplex.com/SqlServerSamples

› Adventure Works Databases for SQL Server 2008http://msftdbprodsamples.codeplex.com/

› Adventure Works Sample Data Warehouse Documentationhttp://technet.microsoft.com/en-us/library/ms124623(SQL.90).aspx

› SQL Authority Adventure Works Tutorialhttp://blog.sqlauthority.com/2008/08/10/sql-server-2008-download-and-install-samples-database-adventureworks-2005-detail-tutorial/

Slide 5

Adventure Works

› Example Database of fictional companynamed „Adventure Works“› SSAS Integration (SQL Server Analysis Services)

› Finance› Franchises› Currency Rates (daily exchange rates)

› Sales› Reseller› Contracts

Slide 6

Available Scenarios

› DM/DW Scenarios› Mining Szenarios

› Forecasting Bikes by Region/Time

› Targeted Mailing Campaign Algorithms for demographic data Age, Region, Volume, etc.

› Market Basked Analysis „suggesting a product“

› Sequence Clustering

Slide 7

Available Scenarios

› OLAP Scenarios› Financial Reporting› Actual versus Budget› Product Profitability Analysis› Sales Force Performance› Trend/Growth Analysis› Promotion Effectiveness

Source: http://msdn.microsoft.com/en-us/library/ms124623.aspx

Slide 8

Adventure Works Data Warehouse

› Data from OLTP DB + Additional „External“ Datasource› Synchronization via available SSIS Packages› Copy of actual (live) data› Can be changed, merged for mining

Simple Datamining with View

02

Slide 9

Homework!

Slide 10

Data Mining Applied with AW DB

› Read and try it out!!!› Preparation

› 1. Get Visual Studio 2008› 2. Get SQL Server 2008› 3. Install Adventure Works Database (DW)

Homeworkhttp://msdn.microsoft.com/en-us/library/ms167167.aspx

Data Mining Applied with AW DB

Don‘t get confused*“SQL Server Business Intelligence Development Studio”is the combination ofMicrosoft Visual Studio 2008+ SQL Server 2008 (not Express)+ with Feature “Business Intelligence”

(*For the first time everybody is confused here! )

Slide 13

A look into the database

› Adventure Works 2008› AdventureWorksDW2008ProductCategory

vDMPrep

vTargetMail

Slide 14

Table: ProductCategory

Id Name rowguid Modified--- ------------ ------------------- -----------1 Bikes CFBDA25C-DF71-[...] 1998-06-01 2 Components C657828D-D808-[...] 1998-06-01 3 Clothing 10A7C342-CA82-[...] 1998-06-01 4 Accessories 2BE3BE36-D9A2-[...] 1998-06-01

Slide 15

View: vTargetMail

-- vTargetMail supports targeted mailing data model-- Uses vDMPrep to determine if a customer buys a bike and joins to DimCustomer

CREATE VIEW [dbo].[vTargetMail] AS SELECT c.[CustomerKey], -- [...] CASE x.[Bikes] WHEN 0 THEN 0 ELSE 1 END AS [BikeBuyer] FROM [dbo].[DimCustomer] c INNER JOIN (SELECT [CustomerKey],[Region],[Age]

,Sum(CASE [EnglishProductCategoryName] WHEN 'Bikes' THEN 1 ELSE 0 END) AS [Bikes]

FROM [dbo].[vDMPrep] GROUP BY [CustomerKey],[Region],[Age]) AS [x]

ON c.[CustomerKey] = x.[CustomerKey];GO

Slide 16

Create Project

› Add Source› Add Source View› Add Mining Structure

› Add Models (Algorithms)› Decision Trees› (Clustering)› (NaiveBayes)

Algorithm: Decision Tree

03

Slide 17

Slide 18

Algorithm Overview

› Used to identify relationships› Column 1, Column 2, Column 3› Most cases: 4 Steps

› Analyze› Create Model (Training)› Verify Model (Testing)› Predict Future Data

Slide 19

Decision Trees

› Also: Classification Trees› Partition Data› Can detect non-linear relationships› Machine Learning Technique

› Sepearate into Training and Testing set› Training set is created to create model based on certain criteria› Test set is used to verify the model

Slide 20

Decision Trees: Example

2,6 % respose rate

Male 3,0%

Female 2,9%

Income > $30 000: 3,6 %

Age < 40: 3,2 %

Males: $30 000

Female: 40+

Response Rate: > 3,5 %

Income < $30 000: 2,3 %

Age > 40: 3,8%

Trained Tree

Slide 21

Pros and Cons of Decision Trees

› Pros› Very flexible, white box Model› Occams Razor: Kiss – Keep it simple, stupid!› Little preparation and resources needed

› Cons› Can be tuned until death› Long time to build› Wisley select training data

False training yields false results Big tree might require disk swapping

THANK YOUFOR YOUR ATTENTION

Slide 22