populating a data warehouse. overview process overview methods of populating a data warehouse tools...

33
Populating a Data Warehouse

Upload: aldous-lawrence

Post on 02-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Populating a Data Warehouse

Page 2: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Overview

Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data Warehouse by Using DTS

Page 3: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Process Overview

Validate, Gather ,Validate, Gather , Transform Transform Populate Data Populate Data Distribute Distribute Make Data Consistent Make Data Consistent Data Data WarehouseWarehouse DataData

SalesSales

ServiceService

OtherOther

Data MartsSource OLTP

SystemsTemporary Data

Staging Area

DataDataWarehouseWarehouse

Sales DataSales DataSales DataSales Data

Hardware DataHardware DataHardware DataHardware Data

OracleOracle

SQLSQLServerServer

OtherOther

Page 4: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Validating Data

Validate and Correct Data at the Source Before You Import It

Determine and Correct Processes That Invalidate Data

Save Invalid Data to a Log for Review

Page 5: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Making Data Consistent

Data Can Be Inconsistent in Several Ways:

Data in each source is consistent, but you want to represent it differently in the data warehouse

Data is represented differently in different sources

You Can Make Data Consistent by:

Translating codes or values to readable strings

Converting multiple versions of the same information into a single representation

Page 6: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Transforming DataTransformTransform

Change

Combine

Calculate

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

reg_idreg_idreg_idreg_id

22

44

66

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

reg_idreg_idreg_idreg_id

22

44

66

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

price_idprice_idprice_idprice_id

.55.55

1.101.10

.98.98

......

qty_idqty_idqty_idqty_id

3232

4848

99

......

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

reg_idreg_idreg_idreg_id

IIII

IVIV

VIVI

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

price_idprice_idprice_idprice_id

.55.55

1.101.10

.98.98

......

qty_idqty_idqty_idqty_id

3232

4848

99

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

buyer_firstbuyer_firstbuyer_firstbuyer_first

AdamAdam

SeanSean

ErinErin

......

buyer_lastbuyer_lastbuyer_lastbuyer_last

BarrBarr

ChaiChai

O’MeliaO’Melia

......

reg_idreg_idreg_idreg_id

22

44

66

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

Page 7: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Methods of Populating a Data Warehouse

Select the Method of Populating a Data WarehouseThat Suits Your Business Needs

Method 1: Validate, combine, and transform datain a temporary data staging area

Method 2: Validate, combine, and transform data during the loading process

Migrate Data During Periods of Relatively Low System Use

Page 8: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Tools for Populating a Data Warehouse

What Is the Appropriate Tool to Use

Transact-SQL Query

Distributed Query

bcp Utility and the BULK INSERT Statement

DTS

Page 9: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

What Is the Appropriate Tool to Use

Format of Source and Destination Data

Location of Source and Destination Data

Import or Export of Database Objects

Frequency of Data Transfer

Interface Preference

Tool Performance

Page 10: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Transact-SQL Query

FullNameFullNameFullNameFullName

Johnson, SteveJohnson, Steve

Smith, DouglasSmith, Douglas

Wilson, LesWilson, Les

Salinger, PaulSalinger, Paul

CustomerSummaryCustomerFirstNameFirstNameFirstNameFirstName

SteveSteveLastNameLastNameLastNameLastName

JohnsonJohnson

DouglasDouglas SmithSmith

LesLes WilsonWilson

PaulPaul SalingerSalinger

USE northwind_martSELECT Lastname + ', ' + Firstname As FullnameINTO CustomerSummaryFROM Northwind.dbo.Customer

USE northwind_martSELECT Lastname + ', ' + Firstname As FullnameINTO CustomerSummaryFROM Northwind.dbo.Customer

Page 11: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Distributed Query

USE northwind_martSELECT productname, companyname INTO item_dimFROM StockServer.sales.dbo.products p JOIN AccountingServer.sales.dbo.suppliers s ON p.supplierid = s.supplierid

USE northwind_martSELECT productname, companyname INTO item_dimFROM StockServer.sales.dbo.products p JOIN AccountingServer.sales.dbo.suppliers s ON p.supplierid = s.supplierid

SalesSalesProducts TableProducts TableProducts TableProducts Table

SalesSales

AccountingServer StockServer

Local SQL Server

Suppliers TableSuppliers TableSuppliers TableSuppliers Table

Item_Dim TableItem_Dim TableItem_Dim TableItem_Dim Table

Page 12: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

bcp Utility and the BULK INSERT Statement

BULK INSERT Accounting.dbo.ordersFROM 'C:\ordersdir\orderstble.dat'WITH(DATAFILE TYPE = 'char'FIELDTERMINATOR = '|',ROWTERMINATOR = '|\n')

BULK INSERT Accounting.dbo.ordersFROM 'C:\ordersdir\orderstble.dat'WITH(DATAFILE TYPE = 'char'FIELDTERMINATOR = '|',ROWTERMINATOR = '|\n')

BCP accounting.dbo.orders in Orderstbl.dat –c –t, -r \n–Smysqlserver –Usa –Pmypassword

BCP accounting.dbo.orders in Orderstbl.dat –c –t, -r \n–Smysqlserver –Usa –Pmypassword

bcp Utililty

BULK INSERT Statement

Page 13: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

DTS

When to Use DTS

DTS Data Source and Destination Types

OLE DB ODBC ASCII text file

DTS Tools DTS Import and Export wizards DTS Designer dtsrun utility

Custom HTML Spreadsheet

Page 14: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Populating a Data Warehouse by Using DTS

Building a DTS Package

Transforming Data by Using an ActiveX Script

Transforming Data by Using a Lookup Query

Defining Transactions

Tracking Data Lineage

Creating a DTS Package Programmatically

Page 15: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Building a DTS Package

Mapping Source and Destination Data

Defining Data Transformation Tasks

Creating and Saving a DTS Package

Executing a DTS Package

Scheduling and Securing a DTS Package

Page 16: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Mapping Source and Destination Data

Mapping Columns

Decide which columns to copy

Choose the columns in the target database that map to the source columns

Mapping Data Types

Specify transformation rules

Specify levels of data conversion

Page 17: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Defining Data Transformation Tasks

DTS Packages Contain Tasks

A Task Can:

Execute a Transact-SQL statement

Execute a script

Launch an external application

Transfer SQL Server 7.0 objects

Execute or retrieve results from a DTS package

Page 18: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Creating and Saving a DTS Package

Creating a DTS Package

By using DTS wizards By using DTS Designer By using a COM interface exposed by DTS

Saving a DTS Package

COM-structured storage file Microsoft Repository SQL Server msdb database

Page 19: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Executing a DTS Package

You Can Execute a DTS Package by Using SQL Server Enterprise Manager or dtsrun Command Prompt Utility

File Storage Location Determines the dtsrun Syntax

dtsrun /sAccounts /uJose /nOrdersImportdtsrun /sAccounts /uJose /nOrdersImport

Page 20: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Scheduling and Securing a DTS Package

Scheduling a DTS Package

Use DTS Import or DTS Export wizards when you save the DTS package to the msdb database

Use SQL Server Enterprise Manager when you usethe dtsrun command prompt utility

Implementing DTS Package Security

Login permissions Owner and user passwords

Page 21: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Demonstration: Transferring Data by Using DTS

Page 22: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Transforming Data by Using an ActiveX Script

Why Use an ActiveX Script

How to Use an ActiveX Script Define a function to contain the transformation script Specify the destination column Specify the source columns

Use operators and VBScript or JScript functions and control-of-flow statements

Set the return code value for the function How to Handle Errors with Return Codes

Page 23: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Examples of ActiveX Scripts

FullNameFullNameFullNameFullName

Johnson, SteveJohnson, Steve

Smith, DouglasSmith, Douglas

Wilson, LesWilson, Les

Salinger, PaulSalinger, Paul

CustomerSummaryCustomerFirstNameFirstNameFirstNameFirstName

SteveSteveLastNameLastNameLastNameLastName

JohnsonJohnson

DouglasDouglas SmithSmith

LesLes WilsonWilson

PaulPaul SalingerSalinger

Function Main()DTSDestination(“FullName”) = DTSSource(“Lastname”) + “, ” + DTSSource(“Firstname”)Main = DTSTransformStat_OKEnd Function

Function Main()DTSDestination(“FullName”) = DTSSource(“Lastname”) + “, ” + DTSSource(“Firstname”)Main = DTSTransformStat_OKEnd Function

Page 24: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Demonstration: Transforming Data by Using an ActiveX Script

Page 25: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Transforming Data by Using a Lookup Query

Customer_dimCustomer_dimCustomer_dimCustomer_dim

NameName

D. SmithD. Smith

L. WilsonL. Wilson

P. SalingerP. Salinger

StateState

FloridaFlorida

WyomingWyoming

ArkansasArkansas

Destination Data

Source Data

Customer_sourceCustomer_sourceCustomer_sourceCustomer_source

NameName

D. SmithD. Smith

L. WilsonL. Wilson

P. SalingerP. Salinger

StateState

FLFL

WYWY

ARAR

Lookup Table

State_lookupState_lookupState_lookupState_lookup

AbbreviationAbbreviation

FLFL

WYWY

ARAR

StateState

FloridaFlorida

WyomingWyoming

ArkansasArkansas

Transform

Page 26: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Implementing a Lookup Query

Set Up Connections to Source, Destination, and Lookup Tables

Create a Task, and Specify the Source and Destination

Add a Lookup Query Definition

Map the Source and Destination Columns, andCall the Lookup Query from the ActiveX Script

Page 27: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Defining Transactions

You Specifically Must Add a Step or Task to the Transaction

You Can Specify When a Transaction Commits

DTS Only Supports One Transaction Per Package

MS DTC Must Be Running

The Data Provider for the Data Destination Must Support Transactions

Page 28: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Tracking Data Lineage

Using Data Lineage Tracks history of data at package and table row levels Provides audit trail of data transformation and DTS

package execution

Implementing Data Lineage Create the table columns in the data warehouse Add data lineage variables to the DTS package Map data lineage source and destination columns

Viewing Data Lineage

1111

2222

3333

Page 29: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Demonstration: Defining Transactions and Tracking Data Lineage

Page 30: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

DTS PackageDTS Package

Create ProcessCreate Process

SourceSourceColumns

StepsStepsStepsStepsStepsStepsStepsStepsStepsStepsPrecedence ConstraintsPrecedence Constraints

Send MailSend Mail

Bulk InsertBulk Insert Transfer ObjectsTransfer Objects

Execute SQLExecute SQL Data-driven QueryData-driven Query

CustomCustom ActiveXActiveX

Data PumpData Pump

StepsStepsStepsStepsTasksTasks

StepsStepsStepsStepsGlobal VariablesGlobal VariablesDestinationDestination

StepsStepsStepsStepsConnectionsConnections

Creating a DTS Package Programmatically

Page 31: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Recommended Practices

Correct and Validate Data at the SourceCorrect and Validate Data at the Source

Use an ActiveX Script or a Transact-SQL Script to Transferand Transform DataUse an ActiveX Script or a Transact-SQL Script to Transferand Transform Data

Use a Temporary Data Storage AreaUse a Temporary Data Storage Area

Save and Store DTS Packages in the Microsoft Repositoryto Maintain Data LineageSave and Store DTS Packages in the Microsoft Repositoryto Maintain Data Lineage

Page 32: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Lab A: Populating a Data Warehouse

Page 33: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data

Review

Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data Warehouse by Using DTS