chapter 5 using sas ® etl studio. section 5.1 sas etl studio overview

78
Chapter 5 Using SAS ® ETL Studio

Upload: ruth-chapman

Post on 24-Dec-2015

246 views

Category:

Documents


2 download

TRANSCRIPT

Chapter 5

Using SAS® ETL Studio

Section 5.1

SAS ETL Studio Overview

3

What Is SAS ETL Studio?SAS ETL Studio, a Java application, is a visual design tool that helps organizations quickly build, implement, and manage ETL processes from source to destination, regardless of the data sources or platforms.

Users can standardize metadata across the organization and perform in-depth transformations with minimal programming or manual work to meet enterprise data integration requirements and to support business and analytic intelligence.

4

What Is SAS ETL Studio?SAS ETL Studio enables you to perform the following tasks: the Extraction of data from operational data stores the Transformation of this data the Loading of the extracted data into your data

warehouse or data mart.

5

What Is SAS ETL Studio?SAS ETL Studio is an application that enables you to manage ETL process flows by allowing: specification of metadata for sources, such as tables

in an operational system specification of metadata for targets – the tables and

other data stores in a data warehouse creation of jobs that specify how data is extracted,

transformed, and loaded from a source to a target.

6

SAS ETL Studio: Change ManagementIn SAS ETL Studio, the change management facility enables multiple SAS ETL Studio users to work with the same metadata repository at the same time without overwriting each other’s changes.

7

SAS ETL Studio: Data Surveyor WizardsOptional Data Surveyor wizards can be licensed that provide access to the metadata in enterprise applications, such as PeopleSoft SAP R/3 Siebel Oracle Applications.

8

SAS ETL Studio: Metadata CWM CompliantThe metadata maintained by SAS ETL Studio is CWM (Common Warehouse Metamodel) compliant and portable to other CWM-compliant applications. Likewise, metadata from other CWM-compliant applications (that is, data modeling tools) can be imported easily into SAS ETL Studio.

9

SAS ETL Studio: Data QualitySAS ETL Studio is fully integrated with the data quality software from DataFlux Corporation. Both products now use the same Quality Knowledge Base (QKB), which contains rules, routines, and schemes necessary to integrate data quality into the ETL process.

10

Extending SAS ETL Studio FunctionalityThe SAS ETL Studio functionality is extended by Java plug-ins packaged with the product.

Further extensions can be implemented by writing additional plug-ins

(Java programming required) using the Transformation Generator Wizard

(no Java programming required).

11

Server Connections and SAS ETL StudioAs a client, SAS ETL Studio must connect to a SAS Metadata Server to read or write metadata. It must connect to other servers to run SAS code, connect to a third-party database management system, or to perform other tasks.

12

Interaction with SAS Application ServersSAS ETL Studio can use different types of application servers:

SAS Metadata ServerSAS Metadata Server Required to read and write Required to read and write metadata in a SAS metadata metadata in a SAS metadata repository.repository.

SAS Workspace ServerSAS Workspace Server Required to execute SAS code Required to execute SAS code and access data.and access data.

SAS/CONNECT ServerSAS/CONNECT Server Required to submit generated Required to submit generated SAS code to machines that are SAS code to machines that are remote to the default SAS remote to the default SAS application server.application server.

...

Section 5.2

The SAS ETL Studio Interface

14

SAS ETL Studio: The InterfaceSAS ETL Studio is a Java client developed to control the ETL process. The interface has several “ease-of-use” features including copy and paste in any text field multiple windows can be open at one time (including

multiple process flow diagrams) Windows look and feel wizard-driven interfaces.

15

Tools, Menus, and Online HelpSAS ETL Studio takes full advantage of toolbars and pull-down menus. The icons available on the toolbar depend on which window is active from within the interface.

Menus and Tools

16

The Shortcut BarOne of the most significant features of SAS ETL Studio is the new process-driven functionality.

Processes are available via a Shortcut bar on the far left side of the main SAS ETL Studio window.

Shortcut Bar

17

The Shortcut BarThe Shortcut bar is populated with icons for each task an ETL user would typically perform, including:

Source DesignerSource Designer defines metadata about the defines metadata about the source(s) for a process.source(s) for a process.

Metadata ImporterMetadata Importer imports metadata from other imports metadata from other applications.applications.

Metadata ExporterMetadata Exporter exports metadata to be used by exports metadata to be used by other applications.other applications.

Process DesignerProcess Designer defines metadata about the ETL defines metadata about the ETL processes.processes.

...continued...

18

The Shortcut Bar

Target DesignerTarget Designer defines metadata about the defines metadata about the target table(s) to be created by target table(s) to be created by the process.the process.

OptionsOptions provides numerous options for provides numerous options for the SAS ETL Studio user to the SAS ETL Studio user to customize the look and feel of the customize the look and feel of the application.application.

...

19

Tree ViewThe SAS ETL Studio Tree View enables you to view the metadata

associated with the current metadata repository

display different views or “trees” of the current repository.

Tree View

20

Tree ViewThere are several tabs available in the tree view area:

...continued...

Inventory TreeInventory Tree lists the metadata objects in the lists the metadata objects in the default metadata repository (and default metadata repository (and any dependant repositories), any dependant repositories), organized by organized by predeterminedpredetermined groupings.groupings.

21

Tree View

...continued...

Custom TreeCustom Tree lists the metadata objects in the lists the metadata objects in the default metadata repository default metadata repository (and any dependant (and any dependant repositories), organized by repositories), organized by user-defineduser-defined groupings of groupings of objects.objects.

22

Tree View

Process Library TreeProcess Library Tree lists the available data lists the available data transformations to be used in transformations to be used in the ETL process. the ETL process.

...

23

Process Library TreeThe Process Library tree displays a collection of transformation templates.

There are four collections (folders) of templates that are provided with SAS ETL Studio: Analysis Data Transforms Output Publish.

24

Process Designer ViewThe Process Designer window is the workspace for building ETL processes. The Process Designer view appears as a final step in the Process Designer wizard.

Once the process is defined, the Process Designer view is populated with icons that represent the chosen processes.

The Process Designer window can be used to view SQL source code review the SAS log (from submitting jobs) view the resulting output from running a SAS job.

25

Process Designer and Overview Windows

Process Designer

View

Overview window

...

26

Overview WindowThe Overview window shows you the complete process from the process view.

From within the Overview window, you can control which part of the process is displayed in the Process View window.

27

SAS ETL Studio WizardsThere are shortcuts which invoke wizards that aid the user in performing various tasks with SAS ETL Studio.

Some of these wizards are Source Designer Target Designer New Job.

28

Source DesignerThe Source Designer is a wizard-driven interface that enables you to define the physical layout of existing tables using a data dictionary or metadata information from the source system.

The result of running the Source Designer successfully is a metadata registration that describes the data source.

29

Target DesignerThe Target Designer is a wizard that allows metadata to be entered for a target.

In designing the target table, you can access any metadata about any

source tables and columns registered in the metadata repository

override any metadata that was imported from another source and add new columns to the target table

create indexes on the target table being created.

30

Target DesignerThe person designing the target table has full control over the type of table being built.

The types of targets that can be built include database types that are supported by the

SAS/ACCESS products SAS data sets (including both data files and data views) SAS/SHARE data sets SPDE tables.

31

New Job WizardThe New Job wizard enables you to define the metadata necessary to run an ETL process to load data into a target or targets.

32

Additional WizardsOther wizards available to provide assistance with various tasks in SAS ETL Studio include Metadata Importer Metadata Exporter Cube Designer Transformation Generator wizard.

You can also install optional data surveyor wizards, which provide access to the metadata in enterprise applications, such as PeopleSoft, SAP R/3, Siebel, and Oracle.

33

Options WindowThe Options window can be used to define standard settings for the SAS ETL Studio interface.

There are several tabs in the Options window: General Process Editor Metadata Tree SAS Server Data Quality.

34

Course Case Study TasksRecall the case study tasks diagram discussed earlier. Each of these tasks involves either reading or writing (or both) metadata.

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

35

SAS ETL Studio Case Study TasksSAS ETL Studio will concentrate on the following four tasks:

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

36

SAS ETL Studio Case Study

DefineData Libraries (+)

These tasks will be performed in sequence:

Define Source Tables Metadata

Define TargetTables Metadata

Define and Run Jobs

1.

2.

3.

4.

37

SAS ETL Studio Case Study – Setup TasksDefine

Data Libraries (+)

Define Source Tables Metadata

Define TargetTables Metadata

Define and Run Jobs

1.

2.

3.

4.

Build Custom Tree Groupings

Libraries

Jobs Source Tables Target Tables

Define Additional Library Definitions

Target Tables Library

Source Tables Library

Demo

Exercises

Demo

Exercises

...

38

SAS ETL Studio Case Study – Define SourcesDefine

Data Libraries (+)

Define Source Tables Metadata

Define TargetTables Metadata

Define and Run Jobs

1.

2.

3.

4.

The Source Designer defines metadata for the source tables.

Orders

Order_Item

Product_List

Demo

Exercises

...

39

SAS ETL Studio Case Study – Define TargetsDefine

Data Libraries (+)

Define Source Tables Metadata

Define TargetTables Metadata

Define and Run Jobs

1.

2.

3.

4.

The Target Designer defines metadata for the target tables.

OrderFact

ProductDim

Demo*

Exercises

* Some derived columns for OrderFact are completed in the exercises.

...

40

SAS ETL Studio Case Study – Define JobsDefine

Data Libraries (+)

Define Source Tables Metadata

Define TargetTables Metadata

Define and Run Jobs

1.

2.

3.

4.The Process Designer defines metadata for jobs that contain the process flow diagrams necessary to load the target tables.

Demo

Exercises

Populate the OrderFact table

Populate the ProductDim table

...

41

Creating the OrderFact TableThe OrderFact table will be created from the Orders and Order_Item tables.

Target Table

Source Tables

...

42

Creating the OrderFact TableThe source tables, Orders and Order_Item, will be combined using the SQL Join transformation.

SQL Join

The SQL Join will be used to define computed columns.

...

43

Creating the OrderFact TableThe table that is the result of the SQL Join will then be loaded into the OrderFact table.

Loader

...

44

Creating the ProductDim TableThe ProductDim table will be created from the Product_List table.

Target Table

Source Table

...

45

Creating the ProductDim TableThe Extract transformation will be used so that a computed column can be defined.

SAS Extract

...

46

Creating the ProductDim TableThe results of the Extract transformation will then be loaded into the target table, ProductDim.

Loader

...

47

SAS ETL Studio Case Study – Setup TasksDefine

Data Libraries (+)

Define Source Tables Metadata

Define TargetTables Metadata

Define and Run Jobs

1.

2.

3.

4.

Build Custom Tree Groupings

Libraries

Jobs Source Tables Target Tables

Define Additional Library Definitions

Target Tables Library

Source Tables Library

Demo

Exercises

Demo

Exercises

48

This demonstration shows how to define a logical grouping object and create a library definition to store in the new grouping.

Create a Logical Grouping and Adding a Library Definition

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

49

This exercise creates logical grouping elements and defines two SAS libraries.

Exercises

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

50

Using the Source DesignerThe Source Designer is a wizard that generates metadata for one or more selected tables, based on the physical structure of the table(s)

The Source Designer can be used to specify metadata for any existing table, not just tables used as data sources for ETL jobs.

51

Using the Source DesignerThe Source Designer is an easy to use wizard interface.

...

52

SAS ETL Studio Case Study – Define SourcesDefine

Data Libraries (+)

Define Source Tables Metadata

Define TargetTables Metadata

Define and Run Jobs

1.

2.

3.

4.

The Source Designer defines metadata for the source tables.

Orders

Order_Item

Product_List

Demo

Exercises

53

This demonstration shows how to add a source table definition for the Orders table.

Add a Source Table Definition

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

54

These exercises add source table definitions for several source tables.

Exercises

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

55

Using the Target DesignerThe Target Designer is a wizard that can create new metadata about a single table that might or might not already exist in physical storage.

It can also be used to create and edit metadata about an OLAP cube.

56

Using the Target DesignerThe Target Designer is an easy to use wizard interface.

...

57

SAS ETL Studio Case Study – Define TargetsDefine

Data Libraries (+)

Define Source Tables Metadata

Define TargetTables Metadata

Define and Run Jobs

1.

2.

3.

4.

The Target Designer defines metadata for the target tables.

OrderFact

ProductDim

Demo*

Exercises

* Some derived columns for OrderFact are completed in the exercises.

58

This demonstration illustrates defining a target table definition for the OrderFact table.

Defining a Target Table

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

59

These exercises add target table definitions for several tables.

Exercises

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

60

Using the Process DesignerThe Process Designer invokes the New Job wizard to create metadata about a job. That metadata is used to build a process flow diagram for the job.

A job is a metadata object that specifies processes that create output. SAS ETL Studio organizes sources, targets, and transformations into jobs that can be displayed in a process flow diagram.

SAS ETL Studio uses each job to generate and/or retrieve SAS code that reads sources and creates targets on a file system.

61

Using the Process DesignerThe New Job wizard prompts for information that is used to build a template in the Process Designer.

...

62

SAS ETL Studio Case Study – Define JobsDefine

Data Libraries (+)

Define Source Tables Metadata

Define TargetTables Metadata

Define and Run Jobs

1.

2.

3.

4.The Process Designer defines metadata for jobs that contain the process flow diagrams necessary to load the target tables.

Demo

Exercises

Populate the OrderFact table

Populate the ProductDim table

63

This demonstration shows how to define a job for the OrderFact target table and enter information about the extraction and transformation of data

Defining a Job

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

64

This exercise shows how to create a new job and enter information about the extraction and transformation of data.

Exercises

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

65

This demonstration shows how to specify the load process attributes as well as executing and verifying a job.

Loading the Target Tables

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

66

This exercise shows how to specify the load process attributes as well as executing and verifying a job.

Exercises

Register Source Tables

DefineData Libraries

CreateETL Jobs

DefineTarget Tables

CreateOLAP Cubes

View and AnalyzeData

CreateStored Processes

Create Reports

CreateInformation Maps

Use the InformationDelivery Portal

Metadata

Section 5.3

Advanced SAS ETL Studio Features (Self-Study)

68

Advanced FeaturesThis section introduces several advanced features to review on your own, including: Data Quality Plug-Ins Importing and Exporting Metadata Change Management.

69

Data Quality Plug-InsSAS ETL Studio contains two data quality transformation templates in the Process Library tree:

Create Match Create Match CodeCode

Used to create a job that creates match Used to create a job that creates match codes and cluster numbers for a codes and cluster numbers for a specified source column and based on specified source column and based on a set of criterion.a set of criterion.

Apply Lookup Apply Lookup StandardizationStandardization

Used to Used to create a job that standardizes create a job that standardizes the values of a source column the values of a source column according to the contents of a specified according to the contents of a specified standardization scheme.standardization scheme.

These templates increase the value of data through data analysis and data cleansing.

...

70

Data Quality Plug-InsTo use the data quality transformation templates, the SAS Data Quality Server software must be installed a SAS application server must be configured to access

a Quality Knowledge Base the Quality Knowledge Base must contain the locales

needed to reference data quality jobs.

When the prerequisites have been met, the data quality transformations can be dragged into process flow diagrams.

71

Create Match Code Plug-InThe Create Match Code plug-in is a tabbed dialog box that reads the Quality Knowledge Base for the specified

locale creates match codes based on the user-specified

criterion.

The match code can then be used to de-duplicate data or join data as part of the transformation step in defining the target table.

72

Apply Lookup Standardization Plug-InThe Apply Lookup Standardization plug-in is a tabbed dialog box that reads the Quality Knowledge Base for the specified

locale loads all of the available standardization schemes.

You can then apply the scheme to one of the source columns as part of the transformation step in defining the target table.

73

Metadata Import WizardThe Metadata Import Wizard is an interface for importing metadata files that are compliant with the Common Warehouse Metamodel (CWM) standard.

By using the Import Wizard to import the metadata from a previously defined data model (source tables or target tables), you do not have to enter the metadata for each table individually. You simply reference a location for the model file, which was created by a third-party modeling tool.

74

Which Metadata Can Be Imported?The CWM standard for metadata was developed by Object Management Group (OMG).

More information about OMG and the CWM metadata standard can be obtained from: http://www.omg.org

More information about Meta Integration Technology, Inc., and the purchase of MIMBs, can be obtained from the following location: http://www.metaintegration.net

75

Metadata Export WizardThe Metadata Export Wizard is an interface for exporting metadata from within SAS ETL Studio to third-party CWM-compliant applications.

The user has the ability to specify the path and the file to create from the export of the metadata.

Once the user completes the Metadata Exporter wizard, a confirmation window verifies all of the selections the user has made for the export of the metadata. Upon exiting this window, the metadata is written to the external file that was specified in the wizard.

76

Change ManagementSAS ETL Studio enables you to create metadata objects that define sources, targets, and the transformations that connect them. These objects are saved to one or more metadata repositories.

The change management feature (or more specifically, metadata source control) enables multiple SAS ETL Studio users to work with the same metadata repository at the same time without overwriting each other's changes.

77

Change ManagementChange management features in SAS ETL Studio include: menus that support change management operations

such as check out and check in the Inventory tree and the Custom tree for working

with metadata that is contained in a change-managed repository

the Project tree for working with metadata that is contained in a project repository

an audit history for each metadata object.

78

Change ManagementAfter an object has been checked out by one person, it is locked so that it cannot be updated by another person until the object has been checked back in.

The only people who can change the metadata in a change-managed repository are the person who started the metadata server administrators who have write access to the repository any users who are authorized to use a project repository

for the change-managed repository.