populating a data warehouse. overview process overview methods of populating a data warehouse tools...
TRANSCRIPT
Populating a Data Warehouse
Overview
Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data Warehouse by Using DTS
Process Overview
Validate, Gather ,Validate, Gather , Transform Transform Populate Data Populate Data Distribute Distribute Make Data Consistent Make Data Consistent Data Data WarehouseWarehouse DataData
SalesSales
ServiceService
OtherOther
Data MartsSource OLTP
SystemsTemporary Data
Staging Area
DataDataWarehouseWarehouse
Sales DataSales DataSales DataSales Data
Hardware DataHardware DataHardware DataHardware Data
OracleOracle
SQLSQLServerServer
OtherOther
Validating Data
Validate and Correct Data at the Source Before You Import It
Determine and Correct Processes That Invalidate Data
Save Invalid Data to a Log for Review
Making Data Consistent
Data Can Be Inconsistent in Several Ways:
Data in each source is consistent, but you want to represent it differently in the data warehouse
Data is represented differently in different sources
You Can Make Data Consistent by:
Translating codes or values to readable strings
Converting multiple versions of the same information into a single representation
Transforming DataTransformTransform
Change
Combine
Calculate
buyer_namebuyer_namebuyer_namebuyer_name
Barr, AdamBarr, Adam
Chai, SeanChai, Sean
O’Melia, ErinO’Melia, Erin
......
reg_idreg_idreg_idreg_id
22
44
66
......
total_salestotal_salestotal_salestotal_sales
17.6017.60
52.8052.80
8.828.82
......
buyer_namebuyer_namebuyer_namebuyer_name
Barr, AdamBarr, Adam
Chai, SeanChai, Sean
O’Melia, ErinO’Melia, Erin
......
reg_idreg_idreg_idreg_id
22
44
66
......
total_salestotal_salestotal_salestotal_sales
17.6017.60
52.8052.80
8.828.82
......
buyer_namebuyer_namebuyer_namebuyer_name
Barr, AdamBarr, Adam
Chai, SeanChai, Sean
O’Melia, ErinO’Melia, Erin
......
price_idprice_idprice_idprice_id
.55.55
1.101.10
.98.98
......
qty_idqty_idqty_idqty_id
3232
4848
99
......
buyer_namebuyer_namebuyer_namebuyer_name
Barr, AdamBarr, Adam
Chai, SeanChai, Sean
O’Melia, ErinO’Melia, Erin
......
reg_idreg_idreg_idreg_id
IIII
IVIV
VIVI
......
total_salestotal_salestotal_salestotal_sales
17.6017.60
52.8052.80
8.828.82
......
buyer_namebuyer_namebuyer_namebuyer_name
Barr, AdamBarr, Adam
Chai, SeanChai, Sean
O’Melia, ErinO’Melia, Erin
......
price_idprice_idprice_idprice_id
.55.55
1.101.10
.98.98
......
qty_idqty_idqty_idqty_id
3232
4848
99
......
total_salestotal_salestotal_salestotal_sales
17.6017.60
52.8052.80
8.828.82
......
buyer_firstbuyer_firstbuyer_firstbuyer_first
AdamAdam
SeanSean
ErinErin
......
buyer_lastbuyer_lastbuyer_lastbuyer_last
BarrBarr
ChaiChai
O’MeliaO’Melia
......
reg_idreg_idreg_idreg_id
22
44
66
......
total_salestotal_salestotal_salestotal_sales
17.6017.60
52.8052.80
8.828.82
......
Methods of Populating a Data Warehouse
Select the Method of Populating a Data WarehouseThat Suits Your Business Needs
Method 1: Validate, combine, and transform datain a temporary data staging area
Method 2: Validate, combine, and transform data during the loading process
Migrate Data During Periods of Relatively Low System Use
Tools for Populating a Data Warehouse
What Is the Appropriate Tool to Use
Transact-SQL Query
Distributed Query
bcp Utility and the BULK INSERT Statement
DTS
What Is the Appropriate Tool to Use
Format of Source and Destination Data
Location of Source and Destination Data
Import or Export of Database Objects
Frequency of Data Transfer
Interface Preference
Tool Performance
Transact-SQL Query
FullNameFullNameFullNameFullName
Johnson, SteveJohnson, Steve
Smith, DouglasSmith, Douglas
Wilson, LesWilson, Les
Salinger, PaulSalinger, Paul
CustomerSummaryCustomerFirstNameFirstNameFirstNameFirstName
SteveSteveLastNameLastNameLastNameLastName
JohnsonJohnson
DouglasDouglas SmithSmith
LesLes WilsonWilson
PaulPaul SalingerSalinger
USE northwind_martSELECT Lastname + ', ' + Firstname As FullnameINTO CustomerSummaryFROM Northwind.dbo.Customer
USE northwind_martSELECT Lastname + ', ' + Firstname As FullnameINTO CustomerSummaryFROM Northwind.dbo.Customer
Distributed Query
USE northwind_martSELECT productname, companyname INTO item_dimFROM StockServer.sales.dbo.products p JOIN AccountingServer.sales.dbo.suppliers s ON p.supplierid = s.supplierid
USE northwind_martSELECT productname, companyname INTO item_dimFROM StockServer.sales.dbo.products p JOIN AccountingServer.sales.dbo.suppliers s ON p.supplierid = s.supplierid
SalesSalesProducts TableProducts TableProducts TableProducts Table
SalesSales
AccountingServer StockServer
Local SQL Server
Suppliers TableSuppliers TableSuppliers TableSuppliers Table
Item_Dim TableItem_Dim TableItem_Dim TableItem_Dim Table
bcp Utility and the BULK INSERT Statement
BULK INSERT Accounting.dbo.ordersFROM 'C:\ordersdir\orderstble.dat'WITH(DATAFILE TYPE = 'char'FIELDTERMINATOR = '|',ROWTERMINATOR = '|\n')
BULK INSERT Accounting.dbo.ordersFROM 'C:\ordersdir\orderstble.dat'WITH(DATAFILE TYPE = 'char'FIELDTERMINATOR = '|',ROWTERMINATOR = '|\n')
BCP accounting.dbo.orders in Orderstbl.dat –c –t, -r \n–Smysqlserver –Usa –Pmypassword
BCP accounting.dbo.orders in Orderstbl.dat –c –t, -r \n–Smysqlserver –Usa –Pmypassword
bcp Utililty
BULK INSERT Statement
DTS
When to Use DTS
DTS Data Source and Destination Types
OLE DB ODBC ASCII text file
DTS Tools DTS Import and Export wizards DTS Designer dtsrun utility
Custom HTML Spreadsheet
Populating a Data Warehouse by Using DTS
Building a DTS Package
Transforming Data by Using an ActiveX Script
Transforming Data by Using a Lookup Query
Defining Transactions
Tracking Data Lineage
Creating a DTS Package Programmatically
Building a DTS Package
Mapping Source and Destination Data
Defining Data Transformation Tasks
Creating and Saving a DTS Package
Executing a DTS Package
Scheduling and Securing a DTS Package
Mapping Source and Destination Data
Mapping Columns
Decide which columns to copy
Choose the columns in the target database that map to the source columns
Mapping Data Types
Specify transformation rules
Specify levels of data conversion
Defining Data Transformation Tasks
DTS Packages Contain Tasks
A Task Can:
Execute a Transact-SQL statement
Execute a script
Launch an external application
Transfer SQL Server 7.0 objects
Execute or retrieve results from a DTS package
Creating and Saving a DTS Package
Creating a DTS Package
By using DTS wizards By using DTS Designer By using a COM interface exposed by DTS
Saving a DTS Package
COM-structured storage file Microsoft Repository SQL Server msdb database
Executing a DTS Package
You Can Execute a DTS Package by Using SQL Server Enterprise Manager or dtsrun Command Prompt Utility
File Storage Location Determines the dtsrun Syntax
dtsrun /sAccounts /uJose /nOrdersImportdtsrun /sAccounts /uJose /nOrdersImport
Scheduling and Securing a DTS Package
Scheduling a DTS Package
Use DTS Import or DTS Export wizards when you save the DTS package to the msdb database
Use SQL Server Enterprise Manager when you usethe dtsrun command prompt utility
Implementing DTS Package Security
Login permissions Owner and user passwords
Demonstration: Transferring Data by Using DTS
Transforming Data by Using an ActiveX Script
Why Use an ActiveX Script
How to Use an ActiveX Script Define a function to contain the transformation script Specify the destination column Specify the source columns
Use operators and VBScript or JScript functions and control-of-flow statements
Set the return code value for the function How to Handle Errors with Return Codes
Examples of ActiveX Scripts
FullNameFullNameFullNameFullName
Johnson, SteveJohnson, Steve
Smith, DouglasSmith, Douglas
Wilson, LesWilson, Les
Salinger, PaulSalinger, Paul
CustomerSummaryCustomerFirstNameFirstNameFirstNameFirstName
SteveSteveLastNameLastNameLastNameLastName
JohnsonJohnson
DouglasDouglas SmithSmith
LesLes WilsonWilson
PaulPaul SalingerSalinger
Function Main()DTSDestination(“FullName”) = DTSSource(“Lastname”) + “, ” + DTSSource(“Firstname”)Main = DTSTransformStat_OKEnd Function
Function Main()DTSDestination(“FullName”) = DTSSource(“Lastname”) + “, ” + DTSSource(“Firstname”)Main = DTSTransformStat_OKEnd Function
Demonstration: Transforming Data by Using an ActiveX Script
Transforming Data by Using a Lookup Query
Customer_dimCustomer_dimCustomer_dimCustomer_dim
NameName
D. SmithD. Smith
L. WilsonL. Wilson
P. SalingerP. Salinger
StateState
FloridaFlorida
WyomingWyoming
ArkansasArkansas
Destination Data
Source Data
Customer_sourceCustomer_sourceCustomer_sourceCustomer_source
NameName
D. SmithD. Smith
L. WilsonL. Wilson
P. SalingerP. Salinger
StateState
FLFL
WYWY
ARAR
Lookup Table
State_lookupState_lookupState_lookupState_lookup
AbbreviationAbbreviation
FLFL
WYWY
ARAR
StateState
FloridaFlorida
WyomingWyoming
ArkansasArkansas
Transform
Implementing a Lookup Query
Set Up Connections to Source, Destination, and Lookup Tables
Create a Task, and Specify the Source and Destination
Add a Lookup Query Definition
Map the Source and Destination Columns, andCall the Lookup Query from the ActiveX Script
Defining Transactions
You Specifically Must Add a Step or Task to the Transaction
You Can Specify When a Transaction Commits
DTS Only Supports One Transaction Per Package
MS DTC Must Be Running
The Data Provider for the Data Destination Must Support Transactions
Tracking Data Lineage
Using Data Lineage Tracks history of data at package and table row levels Provides audit trail of data transformation and DTS
package execution
Implementing Data Lineage Create the table columns in the data warehouse Add data lineage variables to the DTS package Map data lineage source and destination columns
Viewing Data Lineage
1111
2222
3333
Demonstration: Defining Transactions and Tracking Data Lineage
DTS PackageDTS Package
Create ProcessCreate Process
SourceSourceColumns
StepsStepsStepsStepsStepsStepsStepsStepsStepsStepsPrecedence ConstraintsPrecedence Constraints
Send MailSend Mail
Bulk InsertBulk Insert Transfer ObjectsTransfer Objects
Execute SQLExecute SQL Data-driven QueryData-driven Query
CustomCustom ActiveXActiveX
Data PumpData Pump
StepsStepsStepsStepsTasksTasks
StepsStepsStepsStepsGlobal VariablesGlobal VariablesDestinationDestination
StepsStepsStepsStepsConnectionsConnections
Creating a DTS Package Programmatically
Recommended Practices
Correct and Validate Data at the SourceCorrect and Validate Data at the Source
Use an ActiveX Script or a Transact-SQL Script to Transferand Transform DataUse an ActiveX Script or a Transact-SQL Script to Transferand Transform Data
Use a Temporary Data Storage AreaUse a Temporary Data Storage Area
Save and Store DTS Packages in the Microsoft Repositoryto Maintain Data LineageSave and Store DTS Packages in the Microsoft Repositoryto Maintain Data Lineage
Lab A: Populating a Data Warehouse
Review
Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data Warehouse by Using DTS